Re: 1st draft of complete class-based std.random successor

2014-04-03 Thread Joseph Rushton Wakeling

On Tuesday, 25 March 2014 at 00:08:27 UTC, bearophile wrote:
I don't mind, I am happy :-) Thank you for adding a sorely 
needed function.


It's been merged :-)


Re: 1st draft of complete class-based std.random successor

2014-03-25 Thread Joseph Rushton Wakeling

On Tuesday, 25 March 2014 at 00:08:27 UTC, bearophile wrote:
I don't mind, I am happy :-) Thank you for adding a sorely 
needed function.


You are very kind, and far too modest. :-)


Re: 1st draft of complete class-based std.random successor

2014-03-24 Thread Joseph Rushton Wakeling

On Thursday, 20 March 2014 at 00:09:51 UTC, bearophile wrote:
Do you have a simple but very fast function that generates 
uniforms in [0.0, 1.0]? :-)


On that note: 
https://github.com/D-Programming-Language/phobos/pull/2050


Hope you don't mind me jumping ahead of your existing PR on this 
-- it's been inactive so I didn't know if you were planning on 
following up.  I'd be very happy to see you take what's good from 
the above into your own patchset, we need to land a contribution 
from you in Phobos :-)


Re: 1st draft of complete class-based std.random successor

2014-03-24 Thread bearophile

Joseph Rushton Wakeling:

Hope you don't mind me jumping ahead of your existing PR on 
this -- it's been inactive so I didn't know if you were 
planning on following up.


I don't mind, I am happy :-) Thank you for adding a sorely needed 
function.

The useless patch I opened should be closed.

Bye,
bearophile


Re: 1st draft of complete class-based std.random successor

2014-03-23 Thread Joseph Rushton Wakeling

On Saturday, 22 March 2014 at 23:56:35 UTC, bearophile wrote:

They seem good.


Excellent!

There may need to be some attention to the internals of 
uniform01.  Its correctness depends on whether one can always 
trust a float-based RNG to return values in [min, max) or whether 
[min, max] is also going to be supplied by some.



More ideas:

Three suggestions for std.random:
https://d.puremagic.com/issues/show_bug.cgi?id=4851


I think all std.random functions now support a default RNG.  
There were some bugs related to that (e.g. the can't use 
Xorshift one) that I fixed last year.


The problem you identify with,

int r = randomCover(data, rndGen).front;

always returning the same value, is down to the fact that rndGen 
is being copied inside the RandomCover struct by value, so of 
course the original rndGen is never updated and each of these 
calls will produce the same result.  The new std.random2 fixes 
that, because the RNGs are reference types.


However, I'd have thought that

int r = data.sample(1, rndGen).front;

would have been a more efficient way to implement choice, as it 
can operate on any input range, as long as it has the .length 
property; and it ought to be _much_ faster than even a single 
call to randomCover.


One could always use this as a default option, with a 
specialization where data is a RandomAccessRange to use the more 
efficient


int r = data[uniform![)(0, data.length)];


Strongly pure random generator:
https://d.puremagic.com/issues/show_bug.cgi?id=5249


.front and .popFront at least are pure for _all_ the RNGs 
currently implemented in std.random2.generator.  See e.g.:

https://github.com/WebDrake/std.random2/blob/master/std/random2/generator.d#L266-L272
https://github.com/WebDrake/std.random2/blob/master/std/random2/generator.d#L506-L517
https://github.com/WebDrake/std.random2/blob/master/std/random2/generator.d#L821-L834

Of course this is not strongly pure in line with your request, 
but it should enable use of these RNGs in many other scenarios 
where purity is important.


I hope a gaussian (normal distribution) generator is planned or 
present.


https://github.com/WebDrake/std.random2/blob/master/std/random2/distribution.d#L326

This is a range implementation; there will also be a function 
implementation, which will probably follow the inefficient 
Box-Muller variant that uses 2 uniform random variates to 
generate a single normal variate (as per the example you posted 
in your feature request).


Re: 1st draft of complete class-based std.random successor

2014-03-23 Thread bearophile

Joseph Rushton Wakeling:

   int r = data[uniform![)(0, data.length)];


D also accepts:

immutable r = data[uniform![)(0, $)];

Bye,
bearophile


Re: 1st draft of complete class-based std.random successor

2014-03-23 Thread bearophile

Joseph Rushton Wakeling:


I think all std.random functions now support a default RNG.


Is the issue is already fixed in std.random you can close it :-)



However, I'd have thought that

int r = data.sample(1, rndGen).front;

would have been a more efficient way to implement choice, as 
it can operate on any input range, as long as it has the 
.length property; and it ought to be _much_ faster than even a 
single call to randomCover.


One could always use this as a default option, with a 
specialization where data is a RandomAccessRange to use the 
more efficient


int r = data[uniform![)(0, data.length)];


The best thing is to add an efficient choice() function, so no 
efficiency mistake happens :-)




Strongly pure random generator:
https://d.puremagic.com/issues/show_bug.cgi?id=5249


.front and .popFront at least are pure for _all_ the RNGs 
currently implemented in std.random2.generator.  See e.g.:

https://github.com/WebDrake/std.random2/blob/master/std/random2/generator.d#L266-L272
https://github.com/WebDrake/std.random2/blob/master/std/random2/generator.d#L506-L517
https://github.com/WebDrake/std.random2/blob/master/std/random2/generator.d#L821-L834

Of course this is not strongly pure in line with your request, 
but it should enable use of these RNGs in many other scenarios 
where purity is important.


So you are saying that those RNGs are already weakly pure and 
they can't become strongly pure because they take the engine as 
mutable class reference. And even if you design a very small 
random engine that can be created every time you call a random 
generator, the API of all the random functions is not compatible 
with it. So it's not a simple problem...



This is a range implementation; there will also be a function 
implementation, which will probably follow the inefficient 
Box-Muller variant that uses 2 uniform random variates to 
generate a single normal variate (as per the example you posted 
in your feature request).


A possibility is to also add a less precise (more approximate) 
but faster function implementation.



Are the ddocs produced by std.random2 online somewhere?

Bye,
bearophile


Re: 1st draft of complete class-based std.random successor

2014-03-23 Thread Philippe Sigaud
On Sun, Mar 23, 2014 at 11:17 AM, bearophile bearophileh...@lycos.com wrote:
 Joseph Rushton Wakeling:

int r = data[uniform![)(0, data.length)];


 D also accepts:

 immutable r = data[uniform![)(0, $)];

Really? The '$' part works?


Re: 1st draft of complete class-based std.random successor

2014-03-23 Thread Joseph Rushton Wakeling

On Sunday, 23 March 2014 at 10:15:32 UTC, bearophile wrote:

Is the issue is already fixed in std.random you can close it :-)


Well, your request for a choice method is still open ... :-)

The best thing is to add an efficient choice() function, so no 
efficiency mistake happens :-)


Sure, I'm simply raising a couple of simple internal 
implementations that could be used for an effective first draft 
of that function.


So you are saying that those RNGs are already weakly pure and 
they can't become strongly pure because they take the engine as 
mutable class reference. And even if you design a very small 
random engine that can be created every time you call a random 
generator, the API of all the random functions is not 
compatible with it. So it's not a simple problem...


I think I need to make some detailed research into how Haskell 
and other functional languages handle randomness before 
commenting here.  What it does seem to me at this stage is that 
while a weakly pure range-based RNG is readily possible (as 
implemented in std.random2.generator now), I'm not sure that the 
range-based approach typical of Phobos plays nicely with strong 
purity where random number generation is concerned.


A possibility is to also add a less precise (more approximate) 
but faster function implementation.


Again, this is something I'll look into.  I need to re-read the 
paper on gaussian-distribution algorithms that I linked to 
earlier in this thread, but my recollection is that the 
speed/precision tradeoff is something of a false dichotomy given 
the algorithms out there now; so a good range-based solution 
(which stores internal state) will probably be able to provide 
high-quality normal variates faster than a low-quality, purely 
function-based implementation.



Are the ddocs produced by std.random2 online somewhere?


Not yet.  I can make that happen :-)


Re: 1st draft of complete class-based std.random successor

2014-03-22 Thread bearophile

Joseph Rushton Wakeling:

Latest patches rename randomSample = sample, again offering a 
documented alias to assist migration.


Perhaps it's better to not document this alias.


permutation seems good to me (or permute?), but perhaps 
others have suggestions or can point to a typical naming 
practice from other languages?


I'd like a permutations() range in std.range or 
std.combinatorics.  permutation() sounds a bit too much close, 
despite it is inside another module.


Bye,
bearophile


Re: 1st draft of complete class-based std.random successor

2014-03-22 Thread Joseph Rushton Wakeling
Latest patches rename randomSample = sample, again offering a 
documented alias to assist migration.


It would be nice to complete the set and eliminate randomCover, 
but in this case cover seems too vague a name to use.  Any 
suggestions for alternatives?  I wasn't able to readily find an 
equivalent in other random number libraries.


permutation seems good to me (or permute?), but perhaps 
others have suggestions or can point to a typical naming practice 
from other languages?


Re: 1st draft of complete class-based std.random successor

2014-03-22 Thread Joseph Rushton Wakeling

On Saturday, 22 March 2014 at 20:09:00 UTC, bearophile wrote:

Perhaps it's better to not document this alias.


For now it will be documented, for clarity if nothing else.  
Whether that documentation makes it into a Phobos submission, I 
think should depend on formal review.


I'd like a permutations() range in std.range or 
std.combinatorics.  permutation() sounds a bit too much close, 
despite it is inside another module.


How does your desired concept relate to the existing 
std.algorithm.nextPermutation ... ?


Re: 1st draft of complete class-based std.random successor

2014-03-22 Thread bearophile

Joseph Rushton Wakeling:

How does your desired concept relate to the existing 
std.algorithm.nextPermutation ... ?


The API of the lazy permutations/combinations ranges is similar 
to the one I have written here:

http://rosettacode.org/wiki/Permutations#Fast_Lazy_Version

That is also very similar to the permutations/combinations here 
(with the r optional argument):

http://docs.python.org/2/library/itertools.html


To the one in Phobos is used like:

auto items = [1, 2, 3];
do
writeln(items);
while (items.nextPermutation());


The permutations/combinations range is used like (the boolean 
template argument is to specify if it has to copy the buffer or 
not):


[1, 2, 3].permutations!false.writeln;

Bye,
bearophile


Re: 1st draft of complete class-based std.random successor

2014-03-22 Thread Joseph Rushton Wakeling
Latest patches just pushed to repo make the randomSample = 
sample change and introduce a fast uniform01 and 
uniform01Distribution :-)


Re: 1st draft of complete class-based std.random successor

2014-03-22 Thread bearophile

Joseph Rushton Wakeling:

Latest patches just pushed to repo make the randomSample = 
sample change and introduce a fast uniform01 and 
uniform01Distribution :-)


They seem good.

More ideas:

Three suggestions for std.random:
https://d.puremagic.com/issues/show_bug.cgi?id=4851

Strongly pure random generator:
https://d.puremagic.com/issues/show_bug.cgi?id=5249

I hope a gaussian (normal distribution) generator is planned or 
present.


Bye,
bearophile


Re: 1st draft of complete class-based std.random successor

2014-03-21 Thread ponce
On Thursday, 20 March 2014 at 21:17:33 UTC, Joseph Rushton 
Wakeling wrote:

On Thursday, 20 March 2014 at 08:30:09 UTC, ponce wrote:
Related: please consider using parts of SimpleRNG the 
excellent work of John D. Cook which provides many random 
distributions in a compact and documented way.


https://github.com/p0nce/gfm/blob/master/math/gfm/math/simplerng.d 
(here a port)


Good call, I'll take a close look at that.  Can you provide me 
with a link to the original project too?  (Yes, I can just 
Google it, I'm being lazy:-)


http://www.johndcook.com/SimpleRNG.h
http://www.johndcook.com/SimpleRNG.cpp

You will find that there is no license information.
But the author intended this as public domain, he will confirm if 
send an e-mail.


Re: 1st draft of complete class-based std.random successor

2014-03-21 Thread Andrea Fontana

On Friday, 21 March 2014 at 16:01:28 UTC, ponce wrote:
On Thursday, 20 March 2014 at 21:17:33 UTC, Joseph Rushton 
Wakeling wrote:

On Thursday, 20 March 2014 at 08:30:09 UTC, ponce wrote:
Related: please consider using parts of SimpleRNG the 
excellent work of John D. Cook which provides many random 
distributions in a compact and documented way.


https://github.com/p0nce/gfm/blob/master/math/gfm/math/simplerng.d 
(here a port)


Good call, I'll take a close look at that.  Can you provide me 
with a link to the original project too?  (Yes, I can just 
Google it, I'm being lazy:-)


http://www.johndcook.com/SimpleRNG.h
http://www.johndcook.com/SimpleRNG.cpp

You will find that there is no license information.
But the author intended this as public domain, he will confirm 
if send an e-mail.


Hey he uses MWC algorithm! (but not the improved CMWC)



Re: 1st draft of complete class-based std.random successor

2014-03-21 Thread Joseph Rushton Wakeling

On Thursday, 20 March 2014 at 00:39:43 UTC, bearophile wrote:
It's the best chance to improve naming, so do not throw it away 
for nothing:

https://d.puremagic.com/issues/show_bug.cgi?id=9106


I think the following patch should fix that for you:
https://github.com/WebDrake/std.random2/commit/fb5429de77b3c1f7fe3968fd0bd92209c9021f31

I've also made shuffle composable as per your request.  Looks 
good? :-)


Re: 1st draft of complete class-based std.random successor

2014-03-21 Thread bearophile

Joseph Rushton Wakeling:


I think the following patch should fix that for you:
https://github.com/WebDrake/std.random2/commit/fb5429de77b3c1f7fe3968fd0bd92209c9021f31

I've also made shuffle composable as per your request.  Looks 
good? :-)


Seems good. Onward! :-)

Bye,
bearophile


Re: 1st draft of complete class-based std.random successor

2014-03-20 Thread ponce
On Wednesday, 19 March 2014 at 23:49:41 UTC, Joseph Rushton 
Wakeling wrote:
   * std.random2.distribution, random distributions such as 
uniform,

 normal, etc.;


Related: please consider using parts of SimpleRNG the excellent 
work of John D. Cook which provides many random distributions in 
a compact and documented way.


https://github.com/p0nce/gfm/blob/master/math/gfm/math/simplerng.d 
(here a port)


Re: 1st draft of complete class-based std.random successor

2014-03-20 Thread monarch_dodra

On Thursday, 20 March 2014 at 01:32:41 UTC, Chris Williams wrote:
On Wednesday, 19 March 2014 at 23:49:41 UTC, Joseph Rushton 
Wakeling wrote:

Hello all,

As some of you may already know, monarch_dodra and I have 
spent quite a lot of time over the last year discussing the 
state of std.random.  To cut a long story short, there are 
significant problems that arise because the current RNGs are 
value types rather than reference types.


Any chance that you could describe them? I was about to resume 
porting the dcrypt library into Phobos, and had intended to 
flip the classes into structs, to match what the rest of the 
library was doing.


The issue isn't class vs struct, but rather value semantic vs 
reference semantic (classes are always ref, but structs can be 
either). Basically, if you make a copy, and modify the copy, will 
the original range be modified?


The problem with value semantics is that it always un-expected 
duplication of the range, which is a critical blocker problem as 
far as random goes.


The tell-tale usecase is:

//
auto g = rndGen();
g.take(10).writeln();
g.take(10).writeln();
//

This will write the same sequence... TWICE!


Re: 1st draft of complete class-based std.random successor

2014-03-20 Thread monarch_dodra

On Thursday, 20 March 2014 at 00:09:51 UTC, bearophile wrote:

Joseph Rushton Wakeling:
  * std.random2.adaptor, random adaptors such as 
randomShuffle,

randomSample, etc.


Please don't use stuttering names like 
std.random2.randomShuffle. std.random2.shuffle is enough.


Agreed.

`randomShuffle` can be made a deprecated alias: This way, 
random2 should still be mostly drop in replacement, but we 
won't drag along the bad names.


My own feeling is that ultimately it is a responsibility of 
the language to offer nice ways to allocate classes without 
necessarily relying on new or the GC.


I don't think the language is yet there. So I think currently 
this is not a good idea.


I think there is 0 doubt that reference semantics is the way to 
go. An advantage of using class is that it is still *possible* to 
place them on the stack with Scoped, or with some future language 
mechanic. On the other hand, if we implement as reference 
structs, then that's that.


Furthermore, even in terms of performance, I think a heap 
allocated PRNG will still flat-out beat the value based one, if 
only because of the size of the damn thing.


That said, being able to allocate them on the malloc heap, and 
not the GC heap, would be (IMO) also a valid design.


A simple and dumb design might be to still implement them with 
value semantic but:

1. Disable postblit.
2. Make .save() return a Random*
This would mean
1. No dangers of accidental copy.
2. Range* is a ForwardRange.
3. Trivially allows GC/malloc/stack allocation.
With good aliases (alias Random = RadomImpl*;), and a make! 
template we could make the default useage transparent to this 
mechanism yet make it easy to get our hands under the hood.


But at this point, we are really beating around the bush on this 
issue. There are two things for sure:

1. Reference semantics by default.
2. There comes a point where we have to move forward.

I didn't check the code yet, but a middle ground could be to 
make all constructors private, and disable T.init. Then, we force 
construction through a make! template.


This might not be what's most convenient, but it would allow us 
to potentially change the design at a later stage, without 
breaking user code.


Do you have a simple but very fast function that generates 
uniforms in [0.0, 1.0]? :-)


AFAIK, the allocation issue is only for ranges? uniform is just 
a function, I don't think it affected by the issue. Even if you 
are operating on a passed range, either ranges are reference 
semantics, and you take by value, or they are value semantic, and 
you take by ref. Either way, you have to pay for the indirection.


Re: 1st draft of complete class-based std.random successor

2014-03-20 Thread bearophile

monarch_dodra:

I think there is 0 doubt that reference semantics is the way to 
go.


I agree.


Furthermore, even in terms of performance, I think a heap 
allocated PRNG will still flat-out beat the value based one, if 
only because of the size of the damn thing.


OK.


Do you have a simple but very fast function that generates 
uniforms in [0.0, 1.0]? :-)


AFAIK, the allocation issue is only for ranges?


Here I was not talking about allocations:
https://d.puremagic.com/issues/show_bug.cgi?id=5240

Bye,
bearophile


Re: 1st draft of complete class-based std.random successor

2014-03-20 Thread Chris Williams

On Thursday, 20 March 2014 at 08:22:37 UTC, monarch_dodra wrote:
The issue isn't class vs struct, but rather value semantic vs 
reference semantic (classes are always ref, but structs can 
be either).


That's only completely true if structs are referred to by 
pointer. ref parameters/returns aren't quite sufficient to keep a 
struct acting as a reference for all purposes.


But good example. I'll have to consider that when I port the 
cryptographic prngs.


Re: 1st draft of complete class-based std.random successor

2014-03-20 Thread Andrea Fontana
On Wednesday, 19 March 2014 at 23:49:41 UTC, Joseph Rushton 
Wakeling wrote:

Hello all,

As some of you may already know, monarch_dodra and I have spent 
quite a lot of time over the last year discussing the state of 
std.random.  To cut a long story short, there are significant 
problems that arise because the current RNGs are value types 
rather than reference types.  We had quite a lot of back and 
forth on different design ideas, with a lot of helpful input 
from others in the community, but at the end of the day there 
are really only two broad approaches: create structs that 
implement reference semantics internally, or use classes.  So, 
as an exercise, I decided to create a class-based std.random.


The preliminary (but comprehensive) results of this are now 
available here:

https://github.com/WebDrake/std.random2

Besides re-implementing random number generators as classes 
rather than structs, the new code splits std.random2 into a 
package of several different modules:


   * std.random2.generator, pseudo-random number generators;

   * std.random2.device, non-deterministic random sources;

   * std.random2.distribution, random distributions such as 
uniform,

 normal, etc.;

   * std.random2.adaptor, random adaptors such as 
randomShuffle,

 randomSample, etc.

   * std.random2.traits, RNG-specific traits such as 
isUniformRNG

 and isSeedable.

A package.d file groups them together so one can still import 
all together via import std.random2.  I've also taken the 
liberty of following the new guideline to place import 
statements as locally as possible; it was striking how easy and 
clean this made things, and it should be easy to port that 
particular change back to std.random.


The new package implements all of the functions, templates and 
range objects from std.random except for the old 
std.random.uniformDistribution, whose name I have cannibalized 
for better purposes.  Some have been updated: the 
MersenneTwisterEngine has been tweaked to match the 
corresponding code from Boost.Random, and this in turn has 
allowed the definition of a 64-bit Mersenne Twister 
(Mt19937_64) and an alternative 32-bit one (Mt11213b).


There are also a number of entirely new entries.  
std.random2.distribution contains not just existing functions 
such as dice and uniform, but also range-based random 
distribution classes UniformDistribution, NormalDistribution 
and DiscreteDistribution; the last of these is effectively a 
range-based version of dice, and is based on Chris Cain's 
excellent work here: 
https://github.com/D-Programming-Language/phobos/pull/1702


The principal weak point in terms of functionality is 
std.random2.device, where the implemented random devices (based 
on Posix' /std/random and /std/urandom) are really very 
primitive and just there to illustrate the principle.  However, 
since their API is pretty simple (they're just input ranges 
with min and max defined) there should be plenty of opportunity 
to improve and extend the internals in future.  Advice and 
patches are welcome for everything, but particularly here :-)


What's become quite apparent in the course of writing this 
package is how much more natural it is for ranges implementing 
randomness to be class objects.  The basic fact that another 
range can store a copy of an RNG internally without creating a 
copy-by-value is merely the start: for example, in the case of 
the class implementation of RandomSample, we no longer need to 
have complications like,


@property auto ref front()
{
assert(!empty);
// The first sample point must be determined here to 
avoid
// having it always correspond to the first element of 
the
// input.  The rest of the sample points are determined 
each

// time we call popFront().
if (_skip == Skip.None)
{
initializeFront();
}
return _input.front;
}

that were necessary to avoid bugs like 
https://d.puremagic.com/issues/show_bug.cgi?id=7936; because 
the class-based implementation copies by reference, we can just 
initialize everything in the constructor.  Similarly, issues 
like https://d.puremagic.com/issues/show_bug.cgi?id=7067 and 
https://d.puremagic.com/issues/show_bug.cgi?id=8247 just vanish.


Obvious caveats about the approach include the fact that 
classes need to be new'd, and questions over whether allocation 
on the heap might create speed issues.  The benchmarks I've run 
(code available in the repo) seem to suggest that at least the 
latter is not a worry, but these are obviously things that need 
to be considered.  My own feeling is that ultimately it is a 
responsibility of the language to offer nice ways to allocate 
classes without necessarily relying on new or the GC.


A few remarks on design and other factors:

   * The new range objects have been implemented as final 
classes for
 speed purposes.  However, I tried another approach where 
the RNG

 class 

Re: 1st draft of complete class-based std.random successor

2014-03-20 Thread monarch_dodra

On Thursday, 20 March 2014 at 19:04:01 UTC, Andrea Fontana wrote:
Still no cmwc rng... IMO cmwc should replace mt as default RNG. 
Faster. Longer period. More passed tests (if i'm right MT 
didn't pass testu01). And it is parametric to get faster result 
or longer period.


http://en.wikipedia.org/wiki/Multiply-with-carry#Complementary-multiply-with-carry_generators


Would a Lagged Fibonacci generator instead fit your needs? I 
wrote one, but it was held of until `random` was updated.


It's goal, first, is to replace the old module. It'll add new 
stuff once it has achieved that goal.


Re: 1st draft of complete class-based std.random successor

2014-03-20 Thread Joseph Rushton Wakeling

On Thursday, 20 March 2014 at 01:07:54 UTC, bearophile wrote:
In Bugzilla probably there are many bug reports/enhancement 
requests about std.random, so I suggest you to read them. Some 
of them can be useful, while other are probably already 
addressed in the current (or planned) std.random2.


Yes, indeed.  Quite a few of them _are_ addressed, I think, but 
now that I've got the essentials of the design laid out, I should 
be systematic and go through them.



Another random one that was just commented by Infiltrator:
https://d.puremagic.com/issues/show_bug.cgi?id=5901


Well, you already have the NormalDistribution in 
std.random2.distribution ;-)  I clearly can also implement 
function-only Box-Muller variant that spends 2 random variates to 
generate a single normally-distributed value, as this doesn't 
have the problem of needing to store state or allocate memory, so 
I will add that at some stage.


I'm reluctant to add a specific fastRandom because I think here 
the better option is a really nice range-based algorithm that can 
generate high quality variates at speed (e.g. the Ziggurat 
algorithm is a good candidate here).  There's a quite good review 
of different algorithms here:

http://www.cse.cuhk.edu.hk/~phwl/mt/public/archives/papers/grng_acmcs07.pdf

But of course I'm open to arguments here :-)


Re: 1st draft of complete class-based std.random successor

2014-03-20 Thread Joseph Rushton Wakeling

On Thursday, 20 March 2014 at 01:32:41 UTC, Chris Williams wrote:
Any chance that you could describe them? I was about to resume 
porting the dcrypt library into Phobos, and had intended to 
flip the classes into structs, to match what the rest of the 
library was doing.


I think there's a good case for a std.random2.crypto module that 
contains RNGs that are specifically suitable for cryptography.  
That said I think the bar here has to be set VERY high, which is 
why I didn't even begin working on it yet.  It has been argued by 
some that where crypto in Phobos is concerned, we shouldn't take 
community contributions but we should hire security experts to 
write the functionality for us.


Anyway, let's keep in touch about this and discuss how we could 
support one another's efforts.


About the issues with value-type RNGs (as monarch_dodra says, 
it's not structs vs. classes per se, as you can implement 
reference types via structs; it's just more finnicky to do so), 
probably the best starting point is to read through the various 
bugs that have been reported as a result of this:

https://d.puremagic.com/issues/show_bug.cgi?id=7067
https://d.puremagic.com/issues/show_bug.cgi?id=7936
https://d.puremagic.com/issues/show_bug.cgi?id=8247
https://d.puremagic.com/issues/show_bug.cgi?id=10322

Although some of these are marked as fixed, the fixes are pretty 
unpleasant and are workarounds rather than solutions of the 
underlying problem.  It may look like only a few issues, but the 
implications are nasty.  We had extensive discussions about this 
over the last year:

http://forum.dlang.org/thread/mailman.259.1357667544.22503.digitalmar...@puremagic.com
http://forum.dlang.org/thread/mailman.1017.1370879340.13711.digitalmar...@puremagic.com
http://forum.dlang.org/thread/mailman.1157.1371497540.13711.digitalmar...@puremagic.com
http://forum.dlang.org/thread/mailman.1209.1371565034.13711.digitalmar...@puremagic.com
http://forum.dlang.org/thread/mailman.443.1377369357.1719.digitalmar...@puremagic.com
http://forum.dlang.org/thread/5218fd04.8040...@webdrake.net

The bottom line is that implementing your RNGs as classes 
automatically gets you out of the worst of these traps by giving 
you reference semantics from the get-go.  Whether there are other 
problems that arise from this that make you prefer another design 
is a question you'll have to answer for yourself -- someone may 
yet come up with an objection that shows my current design is a 
Very Bad Idea ;-)


Anyway, the example with rndGen.take(10).writeln that 
monarch_dodra gave is probably the best argument one can make.  
Imagine a cryptographic application where you're generating 
(supposedly) two different sets of random data, and because of an 
unintended value-type copy like this they turn out to be 
identical.  Insecure much? :-)


Re: 1st draft of complete class-based std.random successor

2014-03-20 Thread Joseph Rushton Wakeling

On Thursday, 20 March 2014 at 08:30:09 UTC, ponce wrote:
Related: please consider using parts of SimpleRNG the excellent 
work of John D. Cook which provides many random distributions 
in a compact and documented way.


https://github.com/p0nce/gfm/blob/master/math/gfm/math/simplerng.d 
(here a port)


Good call, I'll take a close look at that.  Can you provide me 
with a link to the original project too?  (Yes, I can just Google 
it, I'm being lazy:-)


Re: 1st draft of complete class-based std.random successor

2014-03-20 Thread Chris Williams

On Thursday, 20 March 2014 at 21:16:27 UTC, Joseph Rushton
Wakeling wrote:
I think there's a good case for a std.random2.crypto module 
that contains RNGs that are specifically suitable for 
cryptography.  That said I think the bar here has to be set 
VERY high, which is why I didn't even begin working on it yet.  
It has been argued by some that where crypto in Phobos is 
concerned, we shouldn't take community contributions but we 
should hire security experts to write the functionality for us.


To be certain that the implementation doesn't have any security
holes?


Re: 1st draft of complete class-based std.random successor

2014-03-20 Thread Joseph Rushton Wakeling

On Thursday, 20 March 2014 at 08:51:08 UTC, monarch_dodra wrote:

Agreed.


There is consensus it seems.  I will make the fix ;-)

I think there is 0 doubt that reference semantics is the way to 
go. An advantage of using class is that it is still *possible* 
to place them on the stack with Scoped, or with some future 
language mechanic. On the other hand, if we implement as 
reference structs, then that's that.


I suppose the one concern I have is whether these reference-type 
RNGs might generate unpleasant unintended effects with other 
range objects in Phobos.  One thing that I really must do now 
that the basic design is in place is to systematically go through 
all the different ways in which these ranges could interact with 
deterministic ranges, and whether there are any issues to address.


Furthermore, even in terms of performance, I think a heap 
allocated PRNG will still flat-out beat the value based one, if 
only because of the size of the damn thing.


I don't know if you or anyone else has run the simple benchmark 
programs I created, but my impression was that for the RNGs and 
other functions here there is no significant speed difference 
between the std.random2 class implementations and their 
std.random struct predecessors.  Where there _is_ a difference it 
seems more likely to be down to algorithm rather than 
class/struct or heap/stack.


For example, my new Mersenne Twister is slightly slower, but 
probably because it's carrying extra parameters compared to that 
of std.random.  On the other hand, generating random numbers by 
foreach'ing over uniform() calls does not seem to have any speed 
difference with popFrontN()'ing over a Uniform Distribution.


That said, being able to allocate them on the malloc heap, and 
not the GC heap, would be (IMO) also a valid design.


A simple and dumb design might be to still implement them with 
value semantic but:

1. Disable postblit.
2. Make .save() return a Random*
This would mean
1. No dangers of accidental copy.
2. Range* is a ForwardRange.
3. Trivially allows GC/malloc/stack allocation.
With good aliases (alias Random = RadomImpl*;), and a make! 
template we could make the default useage transparent to this 
mechanism yet make it easy to get our hands under the hood.


One strict objection here: .save returning a Random* would mean 
that this kind of unittest will fail, no?


auto rng1 = someRandomGenType;
auto rng2 = rng1.save;
rng1.popFrontN(10);
rng2.popFrontN(10);
assert(rng1.front == rng2.front);

More generally, I think that, while I don't object to doing 
complicated stuff behind the scenes to get things simple and easy 
for the user, the problem I have with the above is that it really 
seems to require so much effort to create something which comes 
naturally with the current std.random2 design.


I didn't check the code yet, but a middle ground could be to 
make all constructors private, and disable T.init. Then, we 
force construction through a make! template.


This might not be what's most convenient, but it would allow us 
to potentially change the design at a later stage, without 
breaking user code.


The idea of making constructors private and forcing the user to 
use the convenience functions is a very interesting one.  As long 
as they provide an adequate interface to completely control all 
implementation parameters, it could provide a way to have 
significant leeway in controlling exactly how RNG instances are 
initialized.


On the other hand it really feels obnoxious to cut users off from 
being able to use objects directly :-(


Do you have a simple but very fast function that generates 
uniforms in [0.0, 1.0]? :-)


AFAIK, the allocation issue is only for ranges? uniform is 
just a function, I don't think it affected by the issue. Even 
if you are operating on a passed range, either ranges are 
reference semantics, and you take by value, or they are value 
semantic, and you take by ref. Either way, you have to pay for 
the indirection.


I think the issue here is just that it's possible to implement a 
really fast high-quality algorithm for uniformly-distributed 
floating point numbers in [0, 1).  That has all sorts of uses not 
just for Phobos users but also internally in e.g. random 
distributions (for example, it'll give a significant speed boost 
to NormalDistribution).


Re: 1st draft of complete class-based std.random successor

2014-03-20 Thread Joseph Rushton Wakeling

On Thursday, 20 March 2014 at 21:42:13 UTC, Chris Williams wrote:

To be certain that the implementation doesn't have any security
holes?


Yes.  Of course, in the current climate one might fear that
they'd be the ones introducing them ... :-)


Re: 1st draft of complete class-based std.random successor

2014-03-20 Thread Joseph Rushton Wakeling

On Thursday, 20 March 2014 at 18:43:49 UTC, Chris Williams wrote:
That's only completely true if structs are referred to by 
pointer. ref parameters/returns aren't quite sufficient to keep 
a struct acting as a reference for all purposes.


As far as I can tell, you're thinking of _passing_ struct 
parameters, and here, indeed, passing by ref is sufficient.


The problem comes when you want to _store_ them.  It's not safe 
to just store a pointer, because the (value type) struct that's 
being pointed to might go out of scope and be deleted.


However, you can make structs behave like reference types behave 
like reference types, simply by making them contain (safe) 
references to the actual data they contain.  E.g. (stupidly 
simple example):


struct Foo
{
  private:
int *_a;

  public:
this(int val)
{
_a = new int;
*_a = val;
}

ref int a() @property
{
return *_a;
}
}

unittest
{
auto foo1 = Foo(23);
auto foo2 = foo1;

foo2.a = 4;

writeln(foo1.a);
}

Most of the discussion over RNGs in the last year is about 
whether we need to take a (more sophisticated) variant of this 
kind of approach to implement reference-type semantics for RNGs, 
or whether we should do something different.  std.random2 is ... 
something different ;-)


1st draft of complete class-based std.random successor

2014-03-19 Thread Joseph Rushton Wakeling

Hello all,

As some of you may already know, monarch_dodra and I have spent 
quite a lot of time over the last year discussing the state of 
std.random.  To cut a long story short, there are significant 
problems that arise because the current RNGs are value types 
rather than reference types.  We had quite a lot of back and 
forth on different design ideas, with a lot of helpful input from 
others in the community, but at the end of the day there are 
really only two broad approaches: create structs that implement 
reference semantics internally, or use classes.  So, as an 
exercise, I decided to create a class-based std.random.


The preliminary (but comprehensive) results of this are now 
available here:

https://github.com/WebDrake/std.random2

Besides re-implementing random number generators as classes 
rather than structs, the new code splits std.random2 into a 
package of several different modules:


   * std.random2.generator, pseudo-random number generators;

   * std.random2.device, non-deterministic random sources;

   * std.random2.distribution, random distributions such as 
uniform,

 normal, etc.;

   * std.random2.adaptor, random adaptors such as randomShuffle,
 randomSample, etc.

   * std.random2.traits, RNG-specific traits such as isUniformRNG
 and isSeedable.

A package.d file groups them together so one can still import all 
together via import std.random2.  I've also taken the liberty 
of following the new guideline to place import statements as 
locally as possible; it was striking how easy and clean this made 
things, and it should be easy to port that particular change back 
to std.random.


The new package implements all of the functions, templates and 
range objects from std.random except for the old 
std.random.uniformDistribution, whose name I have cannibalized 
for better purposes.  Some have been updated: the 
MersenneTwisterEngine has been tweaked to match the corresponding 
code from Boost.Random, and this in turn has allowed the 
definition of a 64-bit Mersenne Twister (Mt19937_64) and an 
alternative 32-bit one (Mt11213b).


There are also a number of entirely new entries.  
std.random2.distribution contains not just existing functions 
such as dice and uniform, but also range-based random 
distribution classes UniformDistribution, NormalDistribution and 
DiscreteDistribution; the last of these is effectively a 
range-based version of dice, and is based on Chris Cain's 
excellent work here: 
https://github.com/D-Programming-Language/phobos/pull/1702


The principal weak point in terms of functionality is 
std.random2.device, where the implemented random devices (based 
on Posix' /std/random and /std/urandom) are really very primitive 
and just there to illustrate the principle.  However, since their 
API is pretty simple (they're just input ranges with min and max 
defined) there should be plenty of opportunity to improve and 
extend the internals in future.  Advice and patches are welcome 
for everything, but particularly here :-)


What's become quite apparent in the course of writing this 
package is how much more natural it is for ranges implementing 
randomness to be class objects.  The basic fact that another 
range can store a copy of an RNG internally without creating a 
copy-by-value is merely the start: for example, in the case of 
the class implementation of RandomSample, we no longer need to 
have complications like,


@property auto ref front()
{
assert(!empty);
// The first sample point must be determined here to avoid
// having it always correspond to the first element of the
// input.  The rest of the sample points are determined 
each

// time we call popFront().
if (_skip == Skip.None)
{
initializeFront();
}
return _input.front;
}

that were necessary to avoid bugs like 
https://d.puremagic.com/issues/show_bug.cgi?id=7936; because the 
class-based implementation copies by reference, we can just 
initialize everything in the constructor.  Similarly, issues like 
https://d.puremagic.com/issues/show_bug.cgi?id=7067 and 
https://d.puremagic.com/issues/show_bug.cgi?id=8247 just vanish.


Obvious caveats about the approach include the fact that classes 
need to be new'd, and questions over whether allocation on the 
heap might create speed issues.  The benchmarks I've run (code 
available in the repo) seem to suggest that at least the latter 
is not a worry, but these are obviously things that need to be 
considered.  My own feeling is that ultimately it is a 
responsibility of the language to offer nice ways to allocate 
classes without necessarily relying on new or the GC.


A few remarks on design and other factors:

   * The new range objects have been implemented as final classes 
for
 speed purposes.  However, I tried another approach where the 
RNG

 class templates were abstract classes, and the individual
 parameterizations were 

Re: 1st draft of complete class-based std.random successor

2014-03-19 Thread Rikki Cattermole
Out of interest but, shouldn't in the device module have a static 
assert(0, Not implemented yet) type of deal with the 
version(Posix) block?


Re: 1st draft of complete class-based std.random successor

2014-03-19 Thread bearophile

Joseph Rushton Wakeling:

Few first comments:

   * std.random2.adaptor, random adaptors such as 
randomShuffle,

 randomSample, etc.


Please don't use stuttering names like 
std.random2.randomShuffle. std.random2.shuffle is enough.



My own feeling is that ultimately it is a responsibility of the 
language to offer nice ways to allocate classes without 
necessarily relying on new or the GC.


I don't think the language is yet there. So I think currently 
this is not a good idea.


Do you have a simple but very fast function that generates 
uniforms in [0.0, 1.0]? :-)


Bye,
bearophile


Re: 1st draft of complete class-based std.random successor

2014-03-19 Thread Joseph Rushton Wakeling
On Wednesday, 19 March 2014 at 23:58:36 UTC, Rikki Cattermole 
wrote:
Out of interest but, shouldn't in the device module have a 
static assert(0, Not implemented yet) type of deal with the 
version(Posix) block?


Not really.  There's still usable functionality in there for all 
architectures (although I'm not sure how practically useful).


Re: 1st draft of complete class-based std.random successor

2014-03-19 Thread Joseph Rushton Wakeling

On Thursday, 20 March 2014 at 00:09:51 UTC, bearophile wrote:
Do you have a simple but very fast function that generates 
uniforms in [0.0, 1.0]? :-)


No, but it's planned.  Jerro wrote quite a nice one in the course 
of his work on the Ziggurat algorithm, and I'm sure he'd be happy 
for me to adapt it accordingly.


Re: 1st draft of complete class-based std.random successor

2014-03-19 Thread Joseph Rushton Wakeling

On Thursday, 20 March 2014 at 00:09:51 UTC, bearophile wrote:
Please don't use stuttering names like 
std.random2.randomShuffle. std.random2.shuffle is enough.


I don't object to rewriting the names if there's a valid case for 
it, but it does seem to me to be important to try and match as 
much as possible the names that are already out there in 
std.random.  The idea is to minimize the amount of rewriting 
anyone will have to do to adapt their code, and as far as I can 
tell where the contents of std.random2.adaptor are concerned 
(randomShuffle, randomCover, randomSample) it should require no 
rewriting at all.


Besides, while std.random2.adaptor.randomShuffle may be the 
fully-qualified name, in practice, no one will write all that 
out, so the redundancy is less bad; and in any case, as any 
magician will tell you, a shuffle is not necessarily random ;-)


I don't think the language is yet there. So I think currently 
this is not a good idea.


If the aim were to overwrite std.random, I would agree with you, 
but there is no need to do that.  It's named std.random2 for a 
reason :-)


However, I do think that merging it into Phobos (assuming all 
other factors are OK) may have to be conditional on improvements 
in the available allocation strategies.


Re: 1st draft of complete class-based std.random successor

2014-03-19 Thread bearophile

Joseph Rushton Wakeling:

No, but it's planned.  Jerro wrote quite a nice one in the 
course of his work on the Ziggurat algorithm, and I'm sure he'd 
be happy for me to adapt it accordingly.


Note: I meant a simple but very fast function that generates just 
one value in [0.0, 1.0] (not a range).



I don't object to rewriting the names if there's a valid case 
for it, but it does seem to me to be important to try and match 
as much as possible the names that are already out there in 
std.random.


It's the best chance to improve naming, so do not throw it away 
for nothing:

https://d.puremagic.com/issues/show_bug.cgi?id=9106


The idea is to minimize the amount of rewriting anyone will 
have to do to adapt their code,


If you want you can keep a deprecated randomShuffle alias name 
for some time in std.random2.



Besides, while std.random2.adaptor.randomShuffle may be the 
fully-qualified name, in practice, no one will write all that 
out, so the redundancy is less bad;


I agree. But better to improve names when you have a (the only) 
chance.



However, I do think that merging it into Phobos (assuming all 
other factors are OK) may have to be conditional on 
improvements in the available allocation strategies.


We will probably have the nice Andrei's allocators in Phobos, but 
not in a short time. So I suggest to not rely on them for 
std.random2.


Bye,
bearophile


Re: 1st draft of complete class-based std.random successor

2014-03-19 Thread Joseph Rushton Wakeling

On Thursday, 20 March 2014 at 00:39:43 UTC, bearophile wrote:
Note: I meant a simple but very fast function that generates 
just one value in [0.0, 1.0] (not a range).


There will be both. :-)

Off the top of my head I'm not sure whether the interval will be 
[0.0, 1.0], [0.0, 1.0) or whether it might be possible to make it 
work with arbitrary boundaries.  If I recall right, most 
uniform01 implementations are [0.0, 1.0) ... ?


It's the best chance to improve naming, so do not throw it away 
for nothing:

https://d.puremagic.com/issues/show_bug.cgi?id=9106

If you want you can keep a deprecated randomShuffle alias name 
for some time in std.random2.


Yes, in that case, I'd be happy to make the change (and maintain 
the old names via aliases).  Thanks for pointing me to the bug 
report; I'd forgotten that this was an open issue :-)


We will probably have the nice Andrei's allocators in Phobos, 
but not in a short time. So I suggest to not rely on them for 
std.random2.


No, I don't intend to rely on them arriving soon.  But while of 
course a random3 is always possible too, I'd rather not be faced 
with the situation of needing breaking changes to handle support 
for alternative allocation strategies.  So if necessary, I'd 
rather maintain std.random2 outside of Phobos for a while and get 
things right when it finally lands, than push it in too early and 
need to make breaking changes.


Re: 1st draft of complete class-based std.random successor

2014-03-19 Thread Rikki Cattermole
On Thursday, 20 March 2014 at 00:15:22 UTC, Joseph Rushton 
Wakeling wrote:
On Thursday, 20 March 2014 at 00:05:20 UTC, Joseph Rushton 
Wakeling wrote:
Not really.  There's still usable functionality in there for 
all architectures (although I'm not sure how practically 
useful).


Just to expand on that remark: my impression is that individual 
random devices are inevitably going to be 
architecture-dependent.
 /dev/random and /dev/urandom are Posix devices; Windows AFAIK 
has its own alternative.  So the broad idea is that you'd have 
as much generic functionality as possible available to all 
architectures (mostly related to what sources you read from; a 
file, a socket, something else?), and then individual 
architecture-dependent aliases would map this to particular 
random sources available to them.


Then, finally, you'd have some default alias RandomDevice that 
would point to an appropriate architectural default; so e.g.


version (Posix)
{
alias RandomDevice = DevURandom!uint;
}
else version (Windows)
{
alias RandomDevice = ...
}
// etc.

... so, unless you were quite specific about your requirements, 
90% of the time you could just use RandomDevice and expect it 
to Just Work whatever your platform.


But as random devices are not my strongest area of expertise, 
I'll happily take advice here.


For version blocks of code, I try to make sure they implement the 
same interfaces like this one. To limit the possible issues.

It just makes things a little more cleaner for API users.

In the case that this isn't production ready the static assert 
can be used as a TODO type of thing. Forcibly telling you it 
isn't done yet. Its better than silently going into production 
and finding out that a main platform isn't ready.


But this is just my preference.


Re: 1st draft of complete class-based std.random successor

2014-03-19 Thread bearophile

Joseph Rushton Wakeling:

Thanks for pointing me to the bug report; I'd forgotten that 
this was an open issue :-)


In Bugzilla probably there are many bug reports/enhancement 
requests about std.random, so I suggest you to read them. Some of 
them can be useful, while other are probably already addressed in 
the current (or planned) std.random2.


Another random one that was just commented by Infiltrator:
https://d.puremagic.com/issues/show_bug.cgi?id=5901

Bye,
bearophile


Re: 1st draft of complete class-based std.random successor

2014-03-19 Thread Chris Williams
On Wednesday, 19 March 2014 at 23:49:41 UTC, Joseph Rushton 
Wakeling wrote:

Hello all,

As some of you may already know, monarch_dodra and I have spent 
quite a lot of time over the last year discussing the state of 
std.random.  To cut a long story short, there are significant 
problems that arise because the current RNGs are value types 
rather than reference types.


Any chance that you could describe them? I was about to resume 
porting the dcrypt library into Phobos, and had intended to flip 
the classes into structs, to match what the rest of the library 
was doing.