Re: Fixing implicit copies of InputRanges [was: Re: Mir Random [WIP]]

2016-11-27 Thread Jonathan M Davis via Digitalmars-d
On Saturday, November 26, 2016 12:24:47 Andrei Alexandrescu via Digitalmars-
d wrote:
> On 11/26/2016 01:55 AM, Martin Nowak wrote:
> > On Saturday, 26 November 2016 at 06:46:19 UTC, Martin Nowak wrote:
> >> Maybe non-copyability needs to become a requirement for InputRanges.
> >
> > Could have an optional .clone if copying is supported.
> > What would be an InputRange where copying is correct?
>
> Input ranges with disabled this(this) would be a major breaking change
> that we probably cannot bear (and shouldn't because the gain is small).
> I think an input range would be at best "pure reference" in the sense
> that popFront advances all copies to the same position. -- Andrei

For a while now, I've been tempted to argue that we should require that
well-behaved input ranges be reference types (or at least have reference
semantics). It just seems like too much misbehaves if they're not, and
arguably, the whole reason that they're input ranges and not forward ranges
(aside from the programmer just not bothering to implement save) is that
they effectively have reference semantics whether they're actually coded
that way or not. We obviously can't test for that behavior at compile time,
but it could easily be on the list of things that are documented as a
requirement for well-behaved ranges.

I'm also tempted to argue that well-behaved forward ranges and greater
should have value semantics (in the sense that dynamic arrays do), but
that's a much bigger change (it effectively makes save unnecessary), and
proper use of save works around that problem (though it's very common that
save isn't used as often as it should be). It would clean up a lot of the
mistakes that occur with forward ranges though.

However, regardless, a non-copyable range would not play nicely at all with
how range-based code works right now. You'd have to use ref all over the
place, which range-based code simply doesn't do and would not play well with
function chaining, since you'd need lvalues, and since you couldn't copy the
range to the stack to make it an lvalue, that seems like it would go nowhere
fast.

- Jonathan M Davis



Re: Fixing implicit copies of InputRanges [was: Re: Mir Random [WIP]]

2016-11-26 Thread Andrei Alexandrescu via Digitalmars-d

On 11/26/2016 01:55 AM, Martin Nowak wrote:

On Saturday, 26 November 2016 at 06:46:19 UTC, Martin Nowak wrote:

Maybe non-copyability needs to become a requirement for InputRanges.


Could have an optional .clone if copying is supported.
What would be an InputRange where copying is correct?


Input ranges with disabled this(this) would be a major breaking change 
that we probably cannot bear (and shouldn't because the gain is small). 
I think an input range would be at best "pure reference" in the sense 
that popFront advances all copies to the same position. -- Andrei


Re: Fixing implicit copies of InputRanges [was: Re: Mir Random [WIP]]

2016-11-26 Thread Joseph Rushton Wakeling via Digitalmars-d

On Saturday, 26 November 2016 at 06:55:24 UTC, Martin Nowak wrote:

Should we split off this discussion to a dlang-study thread?


I would personally really welcome that, but subject to the 
understanding that people agree to look seriously at random 
algorithms (like RandomSample) and not focus the discussion 
solely on RNGs.


Re: Mir Random [WIP]

2016-11-26 Thread Joseph Rushton Wakeling via Digitalmars-d

On Saturday, 26 November 2016 at 06:46:19 UTC, Martin Nowak wrote:
Yes the problems are inadvertent copies and a disabled 
this(this) would prevent that. RNGs should have unique 
ownership of their internal state.
Using InputRanges with phobos is somewhat clumsy. Maybe people 
have been burned by those and now generally consider 
InputRanges as broken.
Maybe non-copyability needs to become a requirement for 
InputRanges.


I remember you made that suggestion of disabled this(this) to me 
a while back, and I did use it for this small collection of RNGs:

https://github.com/WebDrake/dxorshift

We should add a reference RNG that transforms any other RNG 
into an input range that shares the actual RNG.


We already have refRange for that, no?
Also refCounted(Random(unpredictableSeed)) should work.


RefRange still has the issue that it only forwards the range 
primitives and not other properties such as the `isUniformRandom` 
enum.  But those are probably fixable.


However, these approaches basically cover the case of something 
where an instance is initialized high up in the program and 
passed around by ref throughout its lifetime.  That doesn't 
address the question of how random _algorithms_ like 
`RandomSample` could be updated for safety (since they might 
often need to be initialized multiple times in the inner loops of 
a program).  See:

https://forum.dlang.org/post/gnlvyogolkmocujtn...@forum.dlang.org
https://forum.dlang.org/post/gpsilefdillwtleuw...@forum.dlang.org

for some discussion of the problem.


Fixing implicit copies of InputRanges [was: Re: Mir Random [WIP]]

2016-11-25 Thread Martin Nowak via Digitalmars-d

On Saturday, 26 November 2016 at 06:46:19 UTC, Martin Nowak wrote:
Maybe non-copyability needs to become a requirement for 
InputRanges.


Could have an optional .clone if copying is supported.
What would be an InputRange where copying is correct?


We already have refRange for that, no?
Also refCounted(Random(unpredictableSeed)) should work.


Same scheme works for files with only plain int fds streams, 
non-copyable/unique ownership, and move, refRange, or refCounted 
for passing and sharing.


Should we split off this discussion to a dlang-study thread?


Re: Mir Random [WIP]

2016-11-25 Thread Martin Nowak via Digitalmars-d
On Wednesday, 23 November 2016 at 01:34:23 UTC, Andrei 
Alexandrescu wrote:

On 11/22/16 7:30 PM, John Colvin wrote:
On Tuesday, 22 November 2016 at 23:55:01 UTC, Andrei 
Alexandrescu wrote:
The proposed design has disabled copy ctor and uses opCall() 
for a new element. That seems to be a difference without a 
distinction from an input range that has disabled copy ctor and 
offers the input range primitives.


Yes the problems are inadvertent copies and a disabled this(this) 
would prevent that. RNGs should have unique ownership of their 
internal state.
Using InputRanges with phobos is somewhat clumsy. Maybe people 
have been burned by those and now generally consider InputRanges 
as broken.
Maybe non-copyability needs to become a requirement for 
InputRanges.


We should add a reference RNG that transforms any other RNG 
into an input range that shares the actual RNG.


We already have refRange for that, no?
Also refCounted(Random(unpredictableSeed)) should work.


Re: Mir Random [WIP]

2016-11-25 Thread Kagamin via Digitalmars-d
On Thursday, 24 November 2016 at 14:22:05 UTC, Jonathan M Davis 
wrote:
Then call popFront or drop before passing it along if you're 
paranoid about it.


There's little need for it, as it was pointed out earlier that 
the generate algorithm does the right thing and needs only 
opCall, also a nice example of orthogonality and composability.


Re: Mir Random [WIP]

2016-11-25 Thread Jonathan M Davis via Digitalmars-d
On Friday, November 25, 2016 08:32:07 Joseph Rushton Wakeling via 
Digitalmars-d wrote:
> On Wednesday, 23 November 2016 at 21:19:51 UTC, Jonathan M Davis
>
> wrote:
> > It's quite feasible if you don't calculate front until it's
> > called or popFront is called, and then you cache the result so
> > that subsequent calls to front prior to the call to popFront
> > return the same result, but it creates additional overhead,
> > because then every call to front and popFront has to check
> > whether the value has been calculated yet. And if every range
> > in the chain of ranges has to do that, then that additional
> > overhead is going to add up.
>
> Yes, indeed.  I actually came up with a general purpose solution
> for that a while back via template mixin (although the version in
> the post linked to is not the final version: I updated it to use
> HaraldZealot's suggestion of making the frontImplementation and
> popFrontImplementation the template parameters):
> https://forum.dlang.org/post/mailman.263.1439926667.13986.digitalmars-d@pu
> remagic.com
> > In general, it's just cleaner to either calculate front on
> > every call to front (which doesn't make sense in the case of
> > random number generators) or to calculate the first front upon
> > construction and be done with it.
>
> In some ways I think it's cleaner to not privilege the first
> front value and go for "true" laziness for all front values --
> particularly when dealing with stuff like RandomSample.

It's much messier from an implementation standpoint. If the constructor
checks for empty and caches front if the range it was given was non-empty,
then every call to front can just return front, and every call to popFront
can just pop the element off and do whatever it does to turn the front from
the wrapped range into the front of this range and then cache it. If the
constructor doesn't do that work, then every call to front and every call to
popFront have to check whether they're the first time that either has been
called and _then_ they end up doing all of the same work anyway. So, there's
more code, and it's less efficient. And if every range in the chain does
that, then you're getting a lot of extra checks just to determine whether
it's the first time that front or popFront is being called. Now, if there's
never any caching of the front value, and front always calculates it, then
that might actually be more efficient if front is only ever called once per
element, because you don't get the cost of copying the element to cache it,
but if front gets called multiple times (which happens fairly often, I
think), then the cost is definitely greater.

You're never going to be able to rely on a fully lazy front anyway unless we
specified it as part of the range API that all ranges work that way, and as
it stands, very few ranges do.

> The downside is that `.front` becomes non-const.

Well, realistically, as it stands, ranges are utterly useless with const
anyway. We'd have to have a standard mechanism for getting a tail-const copy
of a range from a const or immutable range, and we don't.

> Sometimes I
> wonder whether it wouldn't have been better to require that
> `popFront()` be called before even the first call to `.front`,
> and place the onus on `popFront()` to handle any special-casing
> of the first `.front` value.

That would be terrible with chaining ranges. It also would be _very_
error-prone. It's also heading into two-phase initialization territory,
which is almost always a bad idea.

> > And I think that the general consensus has been that
> > calculating front in the constructor and popFront and caching
> > works better than calculating it on every call to front, but
> > that doesn't always work (e.g. map), and that caching does
> > cause issues occasionally.
>
> The case I always think of is: what if your input range is
> designed to correspond to sensor observations made at a
> particular time?  This is a case where cacheing the first value
> on construction is very problematic.
>
> For RNGs it's really not such a big deal so long as successive
> variates are independent, but for random algorithms I think the
> laziness matters, since their popFront is essentially
> IO-dependent (in the Haskell-style sense that randomness is IO).

Honestly, I don't think that it really works to have a range where the value
of its elements depend on when you call front or popFront. The API lets you
do it, but you're going to run into plenty of problems if you do unless you
don't care about the frequency at which the values are generated. Part of
the problem here is that ranges and the algorithms that use them are really
designed with arrays in mind. A non-random-access range reduces those
capabilities, and a basic input range doesn't actually require that the
elements be reproducible aside from multiple calls to front giving the same
result, but it's still the case that it's pretty much assumed that a range
is a fixed set of values, and pretty much 

Re: Mir Random [WIP]

2016-11-25 Thread Joseph Rushton Wakeling via Digitalmars-d
On Wednesday, 23 November 2016 at 21:19:51 UTC, Jonathan M Davis 
wrote:
It's quite feasible if you don't calculate front until it's 
called or popFront is called, and then you cache the result so 
that subsequent calls to front prior to the call to popFront 
return the same result, but it creates additional overhead, 
because then every call to front and popFront has to check 
whether the value has been calculated yet. And if every range 
in the chain of ranges has to do that, then that additional 
overhead is going to add up.


Yes, indeed.  I actually came up with a general purpose solution 
for that a while back via template mixin (although the version in 
the post linked to is not the final version: I updated it to use 
HaraldZealot's suggestion of making the frontImplementation and 
popFrontImplementation the template parameters):

https://forum.dlang.org/post/mailman.263.1439926667.13986.digitalmar...@puremagic.com

In general, it's just cleaner to either calculate front on 
every call to front (which doesn't make sense in the case of 
random number generators) or to calculate the first front upon 
construction and be done with it.


In some ways I think it's cleaner to not privilege the first 
front value and go for "true" laziness for all front values -- 
particularly when dealing with stuff like RandomSample.


The downside is that `.front` becomes non-const.  Sometimes I 
wonder whether it wouldn't have been better to require that 
`popFront()` be called before even the first call to `.front`, 
and place the onus on `popFront()` to handle any special-casing 
of the first `.front` value.


And I think that the general consensus has been that 
calculating front in the constructor and popFront and caching 
works better than calculating it on every call to front, but 
that doesn't always work (e.g. map), and that caching does 
cause issues occasionally.


The case I always think of is: what if your input range is 
designed to correspond to sensor observations made at a 
particular time?  This is a case where cacheing the first value 
on construction is very problematic.


For RNGs it's really not such a big deal so long as successive 
variates are independent, but for random algorithms I think the 
laziness matters, since their popFront is essentially 
IO-dependent (in the Haskell-style sense that randomness is IO).


Re: Mir Random [WIP]

2016-11-24 Thread Joseph Rushton Wakeling via Digitalmars-d
On Thursday, 24 November 2016 at 08:36:41 UTC, Andrea Fontana 
wrote:

On Wednesday, 23 November 2016 at 17:31:58 UTC, Kagamin wrote:
On Wednesday, 23 November 2016 at 01:28:11 UTC, Andrei 
Alexandrescu wrote:
Interesting. Could you please add a couple of links about 
that? -- Andrei


http://xoroshiro.di.unimi.it/


Very short public domain implementation:
http://xoroshiro.di.unimi.it/xoroshiro128plus.c


Implementation in D, written during DConf 2016 ;-)
https://github.com/WebDrake/dxorshift


Re: Mir Random [WIP]

2016-11-24 Thread Timon Gehr via Digitalmars-d

On 24.11.2016 14:50, Kagamin wrote:

On Thursday, 24 November 2016 at 12:02:22 UTC, John Colvin wrote:

Because it's correct. If a.length is larger than int.max then
cast(int)a.length will half the time be negative and therefore a
simple rightshift would not be equivalent to division by 2.


It can't be possibly correct when a.length is larger than int.max
because it doesn't recover information lost in a narrowing conversion.


The code does not specify that this information should be recovered. Why 
is this the compiler's problem?


Re: Mir Random [WIP]

2016-11-24 Thread Ilya Yaroshenko via Digitalmars-d

On Thursday, 24 November 2016 at 17:04:43 UTC, John Colvin wrote:


If druntime was initialised by default using 
__attribute__((constructor)) for static and linker .init for 
shared libraries, would that be good enough for you*? I feel 
like you're limiting your design choices because of a 
relatively small and simple implementation shortcoming.


* remember, we don't currently guarantee ABI compatibility 
between compiler versions even if you don't use druntime/phobos.


No, it should work without DRuntime. Nicholas wrote that it will 
work on GPU using dcompute :-) BetterC is the goal for the future 
Mir-Phobos, hehe


Re: Mir Random [WIP]

2016-11-24 Thread John Colvin via Digitalmars-d
On Thursday, 24 November 2016 at 16:09:23 UTC, Ilya Yaroshenko 
wrote:
On Thursday, 24 November 2016 at 13:45:40 UTC, Jonathan M Davis 
wrote:
Alternatively, you could just do rndGen().take(1).front, and 
as long as rndGen() gives you a reference type, it works just 
fine. Unfortunately, std.random did not use reference types 
for its ranges. _That_ is the big mistake of std.random and 
the main item that needs to be fixed. There are some other 
subtleties (e.g. it's useful to be able to save the state of a 
random number generating range, but you don't necessarily 
really want it to be a forward range), but those are minor in 
comparison to the mistake of std.random using value types 
rather than reference types for ranges.


- Jonathan M Davis


The fix is opCall syntax. Reference types are classes, which 
would not work without linking DRuntime and Phobos (BetterC).


If druntime was initialised by default using 
__attribute__((constructor)) for static and linker .init for 
shared libraries, would that be good enough for you*? I feel like 
you're limiting your design choices because of a relatively small 
and simple implementation shortcoming.


* remember, we don't currently guarantee ABI compatibility 
between compiler versions even if you don't use druntime/phobos.


Re: Mir Random [WIP]

2016-11-24 Thread Ilya Yaroshenko via Digitalmars-d
On Thursday, 24 November 2016 at 13:45:40 UTC, Jonathan M Davis 
wrote:
Alternatively, you could just do rndGen().take(1).front, and as 
long as rndGen() gives you a reference type, it works just 
fine. Unfortunately, std.random did not use reference types for 
its ranges. _That_ is the big mistake of std.random and the 
main item that needs to be fixed. There are some other 
subtleties (e.g. it's useful to be able to save the state of a 
random number generating range, but you don't necessarily 
really want it to be a forward range), but those are minor in 
comparison to the mistake of std.random using value types 
rather than reference types for ranges.


- Jonathan M Davis


The fix is opCall syntax. Reference types are classes, which 
would not work without linking DRuntime and Phobos (BetterC). 
Refcounting is to slow to implement for users.
Note, that Engines is only backend RNGs. There are also 
nonuniform distributions (WIP). See the example of users code: 
https://forum.dlang.org/post/nyvtoejvmsaolzaqy...@forum.dlang.org 
.


It is very difficult to implement both user random variable and 
Engine using Range API. Note, that code should work without 
DRuntime and should be simple (the example is user code).


Ilya


Re: Mir Random [WIP]

2016-11-24 Thread Jonathan M Davis via Digitalmars-d
On Thursday, November 24, 2016 13:54:50 Kagamin via Digitalmars-d wrote:
> On Thursday, 24 November 2016 at 13:45:40 UTC, Jonathan M Davis
>
> wrote:
> > How so? Because someone might call range.front again without
> > bothering to call popFront?
>
> That's what everything in std.algorithm does.

Then call popFront or drop before passing it along if you're paranoid about
it. If it's a serious concern that this be fixed in general, then the
obvious fix to that would be to make it so that rndGen() just called
popFront before returning it. Heck, if rndGen() were guaranteed to call
popFront when it was called rather than simply returning the range, then you
could just do rndGen().front whenever you wanted the equivalent of rand(),
making it trivial to use rndGen() both for cases where you wanted a single
number and for cases where you wanted a range of them - and without worrying
about front being stale.

- Jonathan M Davis



Re: Mir Random [WIP]

2016-11-24 Thread Kagamin via Digitalmars-d
On Thursday, 24 November 2016 at 13:45:40 UTC, Jonathan M Davis 
wrote:
How so? Because someone might call range.front again without 
bothering to call popFront?


That's what everything in std.algorithm does.


Re: Mir Random [WIP]

2016-11-24 Thread Kagamin via Digitalmars-d

On Thursday, 24 November 2016 at 12:02:22 UTC, John Colvin wrote:
Because it's correct. If a.length is larger than int.max then 
cast(int)a.length will half the time be negative and therefore 
a simple rightshift would not be equivalent to division by 2.


It can't be possibly correct when a.length is larger than int.max 
because it doesn't recover information lost in a narrowing 
conversion.


Re: Mir Random [WIP]

2016-11-24 Thread Jonathan M Davis via Digitalmars-d
On Thursday, November 24, 2016 09:05:34 Kagamin via Digitalmars-d wrote:
> On Wednesday, 23 November 2016 at 21:33:53 UTC, Jonathan M Davis
>
> wrote:
> > though I think that using the comma operator like that is
> > deprecated now. Adding a helper function such as
> >
> > auto getNext(R)(ref R range)
> > if(isInputRange!R)
> > {
> > range.popFront();
> > return range.front;
> > }
> >
> > would solve that problem.
>
> It's the same behavior and suffers from the same problem of reuse
> of RNG output.

How so? Because someone might call range.front again without bothering to
call popFront? If you're paranoid about that, then it can call popFront
again before returning (though that would be wasteful).

My take on it is that if you just call popFront before using the random
number generator range, then you don't have to worry about what any other
code that used it did.

Regardless, the range API is _way_ more useful in general than something
like rand(). And if anyone is trying to avoid ranges with random numbers, I
think that they're just making their life harder. Occasionally, it's useful
to get just one random number, in which case, having to deal with the range
API over rand() is kind of annoying, but in general, it's not a problem, and
the wrapper function that I suggested basically gives you rand() from a
range of random numbers.

Alternatively, you could just do rndGen().take(1).front, and as long as
rndGen() gives you a reference type, it works just fine. Unfortunately,
std.random did not use reference types for its ranges. _That_ is the big
mistake of std.random and the main item that needs to be fixed. There are
some other subtleties (e.g. it's useful to be able to save the state of a
random number generating range, but you don't necessarily really want it to
be a forward range), but those are minor in comparison to the mistake of
std.random using value types rather than reference types for ranges.

- Jonathan M Davis



Re: Mir Random [WIP]

2016-11-24 Thread John Colvin via Digitalmars-d

On Thursday, 24 November 2016 at 10:41:44 UTC, Kagamin wrote:
On Thursday, 24 November 2016 at 10:16:11 UTC, John Colvin 
wrote:
I was talking about andrei's example. He has a cast(int) in it 
before the division.


And you think that compilation of cast(int)a.length/2 to 3 
instructions is optimized just fine? How is it better that one 
shift?


Because it's correct. If a.length is larger than int.max then 
cast(int)a.length will half the time be negative and therefore a 
simple rightshift would not be equivalent to division by 2.


Re: Mir Random [WIP]

2016-11-24 Thread Kagamin via Digitalmars-d

On Thursday, 24 November 2016 at 10:16:11 UTC, John Colvin wrote:
I was talking about andrei's example. He has a cast(int) in it 
before the division.


And you think that compilation of cast(int)a.length/2 to 3 
instructions is optimized just fine? How is it better that one 
shift?


Re: Mir Random [WIP]

2016-11-24 Thread John Colvin via Digitalmars-d

On Thursday, 24 November 2016 at 10:14:27 UTC, Kagamin wrote:
On Thursday, 24 November 2016 at 09:46:16 UTC, John Colvin 
wrote:
That's because of the cast(int), dividing by two is optimised 
just fine.


What about Andrei's example?


I was talking about andrei's example. He has a cast(int) in it 
before the division.


Re: Mir Random [WIP]

2016-11-24 Thread Kagamin via Digitalmars-d

On Thursday, 24 November 2016 at 09:46:16 UTC, John Colvin wrote:
That's because of the cast(int), dividing by two is optimised 
just fine.


What about Andrei's example?


Re: Mir Random [WIP]

2016-11-24 Thread John Colvin via Digitalmars-d

On Thursday, 24 November 2016 at 08:55:18 UTC, Kagamin wrote:
On Wednesday, 23 November 2016 at 21:18:27 UTC, Andrei 
Alexandrescu wrote:

[Details needed]

I just took a look https://godbolt.org/g/EMy6X4, it's 
happening.


That's three instructions instead of one shr eax,1


That's because of the cast(int), dividing by two is optimised 
just fine. You can even do this:


int foo(int a) { return a / 2; }
uint foo(uint a) { return a / 2; }
int bar(int a)
{
if (a > 0) return a / 2;
else assert(0);
}

and get

int example.foo(int):  // slow, to deal with sign
mov eax, edi
shr eax, 31
add eax, edi
sar eax
ret
uint example.foo(uint):  // fast because no sign
mov eax, edi
shr eax
ret
int example.bar(int):  // single shift because sign always 0
testedi, edi
jle .L8
mov eax, edi
sar eax
ret
.L8:
ud2


This should help explain why the extra/different instructions are 
necessary for signed:


int foo0(int a)
{
asm
{
naked;
mov EAX, EDI;
shr EAX, 31;
add EAX, EDI;
sar EAX, 1;
ret;
}
}

int foo1(int a)
{
asm
{
naked;
mov EAX, EDI;
sar EAX, 1;
ret;
}
}

int foo2(int a)
{
asm
{
naked;
mov EAX, EDI;
shr EAX, 1;
ret;
}
}


void main()
{
import std.stdio;
foreach(i; [int.min, -1, 0, 1, int.max])
writeln(foo0(i), ' ', foo1(i), ' ', foo2(i));
}

output:

-1073741824 -1073741824 1073741824
0 -1 2147483647
0 0 0
0 0 0
1073741823 1073741823 1073741823


Re: Mir Random [WIP]

2016-11-24 Thread Timon Gehr via Digitalmars-d

On 23.11.2016 00:55, Andrei Alexandrescu wrote:

On 11/22/16 1:31 AM, Ilya Yaroshenko wrote:

 - `opCall` API instead of range interface is used (similar to C++)


This seems like a gratuitous departure from common D practice. Random
number generators are most naturally modeled in D as infinite ranges. --
Andrei


I would posit that it is not actually natural to model random numbers as 
ranges. Not every random object is a sequence. Different samples are 
(ideally) independent, so there is little conceptual motivation to group 
them together -- it just increases the logistic overhead needed to get 
at the single samples. I.e., the most natural way to use a PRNG range is 
as follows:


PRNG range;
auto sample(){
auto r=range.front;
range.popFront();
return r;
}

PRNGs are an example where global state is actually desirable, because 
in contrast to most other programming tasks, usually less predictability 
is good.


Re: Mir Random [WIP]

2016-11-24 Thread Ilya Yaroshenko via Digitalmars-d
On Wednesday, 23 November 2016 at 16:54:44 UTC, Andrei 
Alexandrescu wrote:

On 11/23/2016 10:52 AM, Ilya Yaroshenko wrote:
I started with Engines as basis. The library will be very 
different
comparing with Phobos and _any_ other RNG libraries in terms 
of floating

point generation quality. All FP generation I have seen are not
saturated (amount of possible unique FP values are very small 
comparing
with ideal situation because of IEEE arithmetic). I have not 
found the
idea described by others, so it may be an article in the 
future.


Is this a design matter, or an implementation matter? Could you 
implement the superior FP generation on the existing std.random 
API?


Real uniform `rand` (-2^^exp, +2^^exp) and real UniformVariable 
[a, b) was added. `rand` dose not use IEEE arithmetic to generate 
a real random number. --Ilya


Re: Mir Random [WIP]

2016-11-24 Thread Ilya Yaroshenko via Digitalmars-d

On Thursday, 24 November 2016 at 08:59:03 UTC, Kagamin wrote:
On Wednesday, 23 November 2016 at 19:40:47 UTC, Ilya Yaroshenko 
wrote:
Current adaptor should be used in a function scope. (Would be 
great to have a DIP for that kind of semantic check). An RC 
adaptor can be added too. First we need to find a real world 
use case where we need to store a random range outside of a 
function. -- Ilya


You want to instantiate and seed Mt every time you call a 
function that may need random numbers?


A function can use a global RNG defined by a user or accepts RNG 
by reference. Range adaptor stores pointer, not value.


Re: Mir Random [WIP]

2016-11-24 Thread Timon Gehr via Digitalmars-d

On 24.11.2016 09:55, Kagamin wrote:

On Wednesday, 23 November 2016 at 21:18:27 UTC, Andrei Alexandrescu wrote:

[Details needed]

I just took a look https://godbolt.org/g/EMy6X4, it's happening.


That's three instructions instead of one shr eax,1


That would be wrong code. Cast to int after dividing.


Re: Mir Random [WIP]

2016-11-24 Thread Kagamin via Digitalmars-d
On Wednesday, 23 November 2016 at 21:33:53 UTC, Jonathan M Davis 
wrote:
though I think that using the comma operator like that is 
deprecated now. Adding a helper function such as


auto getNext(R)(ref R range)
if(isInputRange!R)
{
range.popFront();
return range.front;
}

would solve that problem.


It's the same behavior and suffers from the same problem of reuse 
of RNG output.


Re: Mir Random [WIP]

2016-11-24 Thread Kagamin via Digitalmars-d
On Wednesday, 23 November 2016 at 19:40:47 UTC, Ilya Yaroshenko 
wrote:
Current adaptor should be used in a function scope. (Would be 
great to have a DIP for that kind of semantic check). An RC 
adaptor can be added too. First we need to find a real world 
use case where we need to store a random range outside of a 
function. -- Ilya


You want to instantiate and seed Mt every time you call a 
function that may need random numbers?


Re: Mir Random [WIP]

2016-11-24 Thread Kagamin via Digitalmars-d
On Wednesday, 23 November 2016 at 21:18:27 UTC, Andrei 
Alexandrescu wrote:

[Details needed]

I just took a look https://godbolt.org/g/EMy6X4, it's happening.


That's three instructions instead of one shr eax,1


Re: Mir Random [WIP]

2016-11-24 Thread Andrea Fontana via Digitalmars-d

On Wednesday, 23 November 2016 at 17:31:58 UTC, Kagamin wrote:
On Wednesday, 23 November 2016 at 01:28:11 UTC, Andrei 
Alexandrescu wrote:
Interesting. Could you please add a couple of links about 
that? -- Andrei


http://xoroshiro.di.unimi.it/


Very short public domain implementation:
http://xoroshiro.di.unimi.it/xoroshiro128plus.c



Re: Mir Random [WIP]

2016-11-23 Thread Somebody via Digitalmars-d
I have to agree that creating a different API that uses opCall 
or whatever instead of a the range API is a bad idea, 
particularly when a simple helper function would make it 
possible to use a random number generator in a fashion more 
like rand() for the cases where that's preferable, and for a 
lot of cases, having the range API is _extremely_ useful.


- Jonathan M Davis


Would, assuming the OpCall() interface, generate!(() => rng()) 
make an input range out of it easily? Or am I missing something?




Re: Mir Random [WIP]

2016-11-23 Thread Jonathan M Davis via Digitalmars-d
On Wednesday, November 23, 2016 15:29:14 Kagamin via Digitalmars-d wrote:
> On Wednesday, 23 November 2016 at 14:30:53 UTC, Andrei
>
> Alexandrescu wrote:
> > This claim would apply to all ranges. There seems to be
> > evidence it is unfounded.
> >
> > The main argument for using the range interface for RNGs is
> > reuse of abstraction. Minute implementation matters are much
> > less important. The main counter-argument is that the
> > abstraction is not fitting well and another abstraction (such
> > as opCall) is more advantageous.
>
> Consider this okayish looking code:
> consume(rng());
> consume(rng.take(2)); //reuses previous value
> consume(rng()); //discards unused value

In the cases where I don't really need a range of random numbers, I've
typically ended up doing stuff like

auto nextNum = rndGen().popFront(), rndGen().front;

though I think that using the comma operator like that is deprecated now.
Adding a helper function such as

auto getNext(R)(ref R range)
if(isInputRange!R)
{
range.popFront();
return range.front;
}

would solve that problem. And there are plenty of cases where having a range
of random numbers is extremely useful. After having dealt with, std.random,
it would really suck to have to deal with a random number generator that was
not range-based. std.random has problems such as not using reference types
for its ranges, but using the range API in the way it does is extremely
useful.

- Jonathan M Davis



Re: Mir Random [WIP]

2016-11-23 Thread Jonathan M Davis via Digitalmars-d
On Wednesday, November 23, 2016 08:54:30 Andrei Alexandrescu via 
Digitalmars-d wrote:
> On 11/23/2016 06:14 AM, Ilya Yaroshenko wrote:
> > I think that Range API is bad and useless overengineering for RNGs.
>
> So it either "takes 1 minute" to change the interface from opCall to
> ranges (i.e. they're virtually the same thing), or it's the above.
> There's precious little overengineering that can be done in 1 minute
>
> :o). I see you did that in a dozen lines in RandomRangeAdaptor.
>
> I understand you believe the range interface is unnecessary or overkill
> for random number generators. I may even agree for certain cases.
> However, there are a few arguments you may want to consider:
>
> * By virtue of working with D, everybody know what a range is.
> Presenting the RNGs that way instantly evokes a host of knowledge,
> understanding, and idioms.
>
> * D users (or at least a fraction) and several algorithms understand the
> notion of an infinite range and make salient decisions based on that. A
> range interface automatically taps into that.

I've frequently found that in cases where I'm not operating on a range of
values, I really just want a way to get a random number, and having to call
both popFront and front to get it is a bit annoying (it's actually one of
the few places that I've ever used the comma operator). That being said, I
think that the correct solution to that is to add a function to std.random
that takes a range, calls popFront on it, and then returns front so that it
can work with any of the random number generators and avoids problems with
whether or not popFront was called prior to calling front. I have to agree
that creating a different API that uses opCall or whatever instead of a
the range API is a bad idea, particularly when a simple helper function
would make it possible to use a random number generator in a fashion more
like rand() for the cases where that's preferable, and for a lot of cases,
having the range API is _extremely_ useful.

- Jonathan M Davis



Re: Mir Random [WIP]

2016-11-23 Thread Jonathan M Davis via Digitalmars-d
On Wednesday, November 23, 2016 14:02:36 Joseph Rushton Wakeling via 
Digitalmars-d wrote:
> On Wednesday, 23 November 2016 at 13:56:27 UTC, Andrei
>
> Alexandrescu wrote:
> > Well I do see another small problem with
> > https://github.com/libmir/mir-random/blob/master/source/random/algorithm
> > .d#L19. It creates the first value eagerly upon construction. That's
> > just a bit awkward. It's not the first time I'm seeing this issue, and
> > to be honest am a bit bummed about it. It's a consequence of all ranges
> > needing to buffer at least one element. -- Andrei
> Yea, this touches on one of my more general concerns related to
> ranges, randomness and laziness.  Part of the trouble is we don't
> have a really good general design for how to implement _truly_
> lazy ranges, where the value of `front` is not determined until
> the point where you first access it.
>
> This has particular relevance to e.g. RandomSample and
> RandomCover.

It's quite feasible if you don't calculate front until it's called or
popFront is called, and then you cache the result so that subsequent calls
to front prior to the call to popFront return the same result, but it
creates additional overhead, because then every call to front and popFront
has to check whether the value has been calculated yet. And if every range
in the chain of ranges has to do that, then that additional overhead is
going to add up.

In general, it's just cleaner to either calculate front on every call to
front (which doesn't make sense in the case of random number generators) or
to calculate the first front upon construction and be done with it. And I
think that the general consensus has been that calculating front in the
constructor and popFront and caching works better than calculating it on
every call to front, but that doesn't always work (e.g. map), and that
caching does cause issues occasionally.

- Jonathan M Davis



Re: Mir Random [WIP]

2016-11-23 Thread Andrei Alexandrescu via Digitalmars-d

On 11/23/16 3:05 PM, Ilya Yaroshenko wrote:

On Wednesday, 23 November 2016 at 19:51:50 UTC, Andrei Alexandrescu wrote:

On 11/23/16 2:40 PM, Ilya Yaroshenko wrote:

On Wednesday, 23 November 2016 at 19:16:29 UTC, Andrei Alexandrescu
wrote:


A tip for the range adaptor: have it allocate and own the generator
internally. That way it's easy to make it reference counted
economically.


Current adaptor should be used in a function scope. (Would be great to
have a DIP for that kind of semantic check). An RC adaptor can be added
too. First we need to find a real world use case where we need to store
a random range outside of a function. -- Ilya


Yah, problem here is we can't make that @safe. -- Andrei


Can we improve D to make it safe?


It's difficult in the general case for objective reasons (modular 
analysis). For this, we may be able to take an argument by ref, save its 
address inside, add some scope attributes. It's a project. -- Andrei


Re: Mir Random [WIP]

2016-11-23 Thread Andrei Alexandrescu via Digitalmars-d

On 11/23/16 11:10 AM, Kagamin wrote:

On Wednesday, 23 November 2016 at 14:30:53 UTC, Andrei Alexandrescu wrote:

That seems to be a minor matter. Of course at best some measurements
would be in order.

Yes, it's just Joseph worries about microoptimizations.


Though we saw that the compiler can't optimize some simple operations
like division by 2.


[Details needed]

I just took a look https://godbolt.org/g/EMy6X4, it's happening.


Andrei



Re: Mir Random [WIP]

2016-11-23 Thread Ilya Yaroshenko via Digitalmars-d
On Wednesday, 23 November 2016 at 19:51:50 UTC, Andrei 
Alexandrescu wrote:

On 11/23/16 2:40 PM, Ilya Yaroshenko wrote:
On Wednesday, 23 November 2016 at 19:16:29 UTC, Andrei 
Alexandrescu wrote:


A tip for the range adaptor: have it allocate and own the 
generator
internally. That way it's easy to make it reference counted 
economically.


Current adaptor should be used in a function scope. (Would be 
great to
have a DIP for that kind of semantic check). An RC adaptor can 
be added
too. First we need to find a real world use case where we need 
to store

a random range outside of a function. -- Ilya


Yah, problem here is we can't make that @safe. -- Andrei


Can we improve D to make it safe?


Re: Mir Random [WIP]

2016-11-23 Thread Andrei Alexandrescu via Digitalmars-d

On 11/23/16 2:40 PM, Ilya Yaroshenko wrote:

On Wednesday, 23 November 2016 at 19:16:29 UTC, Andrei Alexandrescu wrote:


A tip for the range adaptor: have it allocate and own the generator
internally. That way it's easy to make it reference counted economically.


Current adaptor should be used in a function scope. (Would be great to
have a DIP for that kind of semantic check). An RC adaptor can be added
too. First we need to find a real world use case where we need to store
a random range outside of a function. -- Ilya


Yah, problem here is we can't make that @safe. -- Andrei


Re: Mir Random [WIP]

2016-11-23 Thread Ilya Yaroshenko via Digitalmars-d
On Wednesday, 23 November 2016 at 19:16:29 UTC, Andrei 
Alexandrescu wrote:


A tip for the range adaptor: have it allocate and own the 
generator internally. That way it's easy to make it reference 
counted economically.


Current adaptor should be used in a function scope. (Would be 
great to have a DIP for that kind of semantic check). An RC 
adaptor can be added too. First we need to find a real world use 
case where we need to store a random range outside of a function. 
-- Ilya


Re: Mir Random [WIP]

2016-11-23 Thread Andrei Alexandrescu via Digitalmars-d

On 11/23/2016 01:13 PM, Ilya Yaroshenko wrote:

1. Engine must not have implicit copy semantic: it is not correct for
RNGs and has performance issues. They also should not be copied
explicitly in this example.


OK, so (lack of) copyability is a good argument. I'm ready to live with 
random generators that are noncopyable values and use opCall; then use a 
range adaptor on top of a pointer to that. It seems providing the range 
primitives for a noncopyable object is not useful and is liable to 
create more confusion and frustration than a distinct API.


A tip for the range adaptor: have it allocate and own the generator 
internally. That way it's easy to make it reference counted economically.



Andrei


Re: Mir Random [WIP]

2016-11-23 Thread Ilya Yaroshenko via Digitalmars-d
On Wednesday, 23 November 2016 at 18:13:52 UTC, Ilya Yaroshenko 
wrote:


Assume your are building a modification Mod_X ~ 1 / X + X for a 
distribution. This is how it will look in Mir Random:


EDIT: Mod_X ~ Y + X, Y ~ X. (X & Y are independent)


Re: Mir Random [WIP]

2016-11-23 Thread Ilya Yaroshenko via Digitalmars-d
On Wednesday, 23 November 2016 at 16:54:44 UTC, Andrei 
Alexandrescu wrote:

On 11/23/2016 10:52 AM, Ilya Yaroshenko wrote:
I started with Engines as basis. The library will be very 
different
comparing with Phobos and _any_ other RNG libraries in terms 
of floating

point generation quality. All FP generation I have seen are not
saturated (amount of possible unique FP values are very small 
comparing
with ideal situation because of IEEE arithmetic). I have not 
found the
idea described by others, so it may be an article in the 
future.


Is this a design matter, or an implementation matter? Could you 
implement the superior FP generation on the existing std.random 
API? To not be able to vs. to not want to is a different 
matter; I have an appreciation for each, but we need to 
clarify. -- Andrei


Floating point generation are implementation matter. 
Distributions and algorithms are design matter.


Assume your are building a modification Mod_X ~ 1 / X + X for a 
distribution. This is how it will look in Mir Random:


-

struct Mod(D)
  if(isRandomVariable!D && isFloatingPoint!(ReturnType!D))
{
D irv;
alias irv this;
ReturnType!D opCall(G)(ref G gen)
if(isSURBG!D)
{
ReturnType!D x1 = void;
do x1 = irv(gen);
while(x1 == 0);
ReturnType!D x2 = irv(gen);

/
/   X1 and X2 are independent! 
///



return 1 / x1 + x2;
}
}

unittest
{
auto gen = Xorshift(1);
auto rv = Mod!GammaRandomVariable(...);
auto x = rv(gen); /// generator is never copied by value.
}

-

How it can be done with Range interface instead of opCall?
Please note that users would use Distributions, not Engines.

So, Range API requirements are:

1. Engine must not have implicit copy semantic: it is not correct 
for RNGs and has performance issues. They also should not be 
copied explicitly in this example.


2. Do not use Engine's pointers in RandomVariable, because 
RandomVariables can be passed easily outside of function: they 
are just a small tuples of params.


3. Do not use classes because of BetterC and performance issues.

4. Engines must have Range interface

5. RandomVariables (both Mod and an underlaying) must have Range 
interface.


I don't think Range API for random variables can be done easily 
and without performance or security issues.


Thanks,
Ilya


Re: Mir Random [WIP]

2016-11-23 Thread Kagamin via Digitalmars-d
On Wednesday, 23 November 2016 at 01:28:11 UTC, Andrei 
Alexandrescu wrote:
Interesting. Could you please add a couple of links about that? 
-- Andrei


http://xoroshiro.di.unimi.it/


Re: Mir Random [WIP]

2016-11-23 Thread Andrei Alexandrescu via Digitalmars-d

On 11/23/2016 10:52 AM, Ilya Yaroshenko wrote:

I started with Engines as basis. The library will be very different
comparing with Phobos and _any_ other RNG libraries in terms of floating
point generation quality. All FP generation I have seen are not
saturated (amount of possible unique FP values are very small comparing
with ideal situation because of IEEE arithmetic). I have not found the
idea described by others, so it may be an article in the future.


Is this a design matter, or an implementation matter? Could you 
implement the superior FP generation on the existing std.random API? To 
not be able to vs. to not want to is a different matter; I have an 
appreciation for each, but we need to clarify. -- Andrei


Re: Mir Random [WIP]

2016-11-23 Thread Kagamin via Digitalmars-d
On Wednesday, 23 November 2016 at 14:30:53 UTC, Andrei 
Alexandrescu wrote:
That seems to be a minor matter. Of course at best some 
measurements would be in order.

Yes, it's just Joseph worries about microoptimizations.


Though we saw that the compiler can't optimize some simple 
operations like division by 2.


Re: Mir Random [WIP]

2016-11-23 Thread Ilya Yaroshenko via Digitalmars-d

On Wednesday, 23 November 2016 at 16:01:14 UTC, Ryan wrote:
On Wednesday, 23 November 2016 at 15:44:10 UTC, Joseph Rushton 
Wakeling wrote:


auto m = ().map!(a=>1);

...?  (Untested, but I think it should work.)


Tested, works with 2.071.1


r.randomRangeAdaptor.map!(a=>1);


Re: Mir Random [WIP]

2016-11-23 Thread Ilya Yaroshenko via Digitalmars-d

On Wednesday, 23 November 2016 at 15:59:39 UTC, Kagamin wrote:
On Wednesday, 23 November 2016 at 15:44:10 UTC, Joseph Rushton 
Wakeling wrote:

auto m = ().map!(a=>1);

...?  (Untested, but I think it should work.)


Or RefRange https://issues.dlang.org/show_bug.cgi?id=10888 :)


What the principal difference with randomRangeAdaptor? Anyway, 
noncopyable input range con not be used with Phobos. --Ilya


Re: Mir Random [WIP]

2016-11-23 Thread Ryan via Digitalmars-d
On Wednesday, 23 November 2016 at 15:44:10 UTC, Joseph Rushton 
Wakeling wrote:


auto m = ().map!(a=>1);

...?  (Untested, but I think it should work.)


Tested, works with 2.071.1




Re: Mir Random [WIP]

2016-11-23 Thread Kagamin via Digitalmars-d
On Wednesday, 23 November 2016 at 15:44:10 UTC, Joseph Rushton 
Wakeling wrote:

auto m = ().map!(a=>1);

...?  (Untested, but I think it should work.)


Or RefRange https://issues.dlang.org/show_bug.cgi?id=10888 :)


Re: Mir Random [WIP]

2016-11-23 Thread Ilya Yaroshenko via Digitalmars-d
On Wednesday, 23 November 2016 at 13:41:25 UTC, Andrei 
Alexandrescu wrote:

On 11/23/2016 12:58 AM, Ilya Yaroshenko wrote:
On Tuesday, 22 November 2016 at 23:55:01 UTC, Andrei 
Alexandrescu wrote:

On 11/22/16 1:31 AM, Ilya Yaroshenko wrote:
 - `opCall` API instead of range interface is used (similar 
to C++)


This seems like a gratuitous departure from common D 
practice. Random
number generators are most naturally modeled in D as infinite 
ranges.

-- Andrei


It is safe low level architecture without performance and API 
issues.


I don't understand this. Can you please be more specific? I 
don't see a major issue wrt offering opCall() vs. 
front/popFront. (empty is always true.)


A range to use it with std.algorithm and std.range must be 
copyable (it is passed by value.



It
prevents users to do stupid things implicitly (like copying 
RNGs).


An input range can be made noncopyable.


Ditto. A noncopyable input range is useless.


A
hight level range interface can be added in the future (it 
will hold a

_pointer_ to an RNG).


Is there a reason to not have that now?


Done. See `RandomRangeAdaptor`:
https://github.com/libmir/mir-random/blob/master/source/random/algorithm.d




In additional, when you need to write algorithms
or distributions opCall is much more convenient than range API.


Could you please be more specific? On the face of it I'd agree 
one call is less than two, but I don't see a major drawback 
here.


The main reason in implementation simplicity. Engines should be 
simple to create,
simple to maintain, and simple to use. opCall is more simple then 
range interface because
1. One declaration instead of 4 (3 range functions for plus 
latest generated value (optional))

2. Input range is useless if range is not copyable.
3. `randomRangeAdaptor` is implemented for Engines and will be 
done for Distributions too. So range API is supported better then 
in std.range (because Engines are copied).



In
additions, users would not use Engine API in 99% cases: they 
will just

want to call `rand` or `uniform`, or other distribution.

I am sure that almost any library should have low level API 
that is fits
to its implementation first. Addition API levels also may be 
added.


Is there a large difference between opCall and front/popFront?

Actually I can think of one - the matter of getting things 
started. Ranges have this awkwardness of starting the 
iteration: either you fill the current front eagerly in the 
constructor, or you have some sort of means to detect 
initialization has not yet been done and do it lazily upon the 
first use of front. The best strategy would depend on the 
actual generator, and admittedly would be a bit more of a 
headache compared to opCall. Was this the motivation?


Simplicity is main motivation.


### Example of API+implementation bug:

 Bug: RNGs has min and max params (hello C++). But, they 
are not
used when an uniform integer number is generated : 
`uniform!ulong` /

`uniform!ulong(0, 100)`.

 Solution: In Mir Rundom any RNGs must generate all 
8/16/32/64 bits

uniformly. It is RNG problem how to do it.


Min and max are not parameters, they are bounds provided by 
each generator. I agree their purpose is unclear. We could 
require all generators to provide min = 0 and max = 
UIntType.max without breaking APIs. In that case we only need 
to renounce LinearCongruentialEngine with c = 0 (see 
https://github.com/dlang/phobos/blob/master/std/random.d#L258) 
- in fact that's the main reason for introducing min and max in 
the first place. All other code stays unchanged, and we can 
easily deprecate min and max for RNGs.


(I do see min and max used by uniform at 
https://github.com/dlang/phobos/blob/master/std/random.d#L1281 
so I'm not sure I get what you mean, but anyhow the idea that 
we require RNGs to fill an uint/ulong with all random bits 
simplifies a lot of matters.)


Current Mir solution looks like pair  isURBG and isSURBG. `S` 
prefix means `T.max == ReturnType!T.max` where T is an Engine. 
So, functions use isSURBG now. The min property is not required: 
we can just subtract actual min from a returning value.


An adaptor can be added to convert URBG to Saturated URBG.

I will not fill this bug as well another dozen std.random bugs 
because
the module should be rewritten anyway and I am working on it. 
std.random
is a collection of bugs from C/C++ libraries extended with D 
generic
idioms. For example, there is no reason in 64 bit Xorshift. It 
is 32 bit
by design. Furthermore, 64 expansion of 32 bit algorithms must 
be proved
theoretically before we allow it for end users. 64 bit analogs 
are

exists, but they have another implementations.


One matter that I see is there's precious little difference 
between mir.random and std.random. Much of the code seems 
copied, which is an inefficient way to go about things. We 
shouldn't fork everything if we don't like a bit of it, though 
admittedly the path toward making changes in std is 

Re: Mir Random [WIP]

2016-11-23 Thread Kagamin via Digitalmars-d
On Wednesday, 23 November 2016 at 14:30:53 UTC, Andrei 
Alexandrescu wrote:

That seems to be a minor matter.


Yes, it's just Joseph worries about microoptimizations.

The main counter-argument is that the abstraction is not 
fitting well and another abstraction (such as opCall) is more 
advantageous.


Ranges are designed for reuse of front, which is not desirable 
for RNGs.


Re: Mir Random [WIP]

2016-11-23 Thread Joseph Rushton Wakeling via Digitalmars-d

On Wednesday, 23 November 2016 at 15:25:29 UTC, Kagamin wrote:

struct A
{
bool empty;
int front;
void popFront(){}
@disable this(this);
}

import std.algorithm;
void f()
{
A r;
auto m=r.map!(a=>1);
}

Does this compile for you?


auto m = ().map!(a=>1);

...?  (Untested, but I think it should work.)


Re: Mir Random [WIP]

2016-11-23 Thread Ryan via Digitalmars-d

On Wednesday, 23 November 2016 at 15:29:14 UTC, Kagamin wrote:

On Wednesday, 23 November 2016 at 14:30:53 UTC, Andrei

Consider this okayish looking code:
consume(rng());
consume(rng.take(2)); //reuses previous value
consume(rng()); //discards unused value


Also consider the case of using 1 generator in your program, but 
calling it from different places. In one place, you use opCall 
with the popFront -> front order of calls, and in the other you 
use the range interface directly with the order reversed. You 
would re-use values there too.


By offering both interfaces together it makes it pretty easy to 
create silly bugs like this, especially in a large project.


Might be useful to have a basic RNG interface and then wrap it 
with a template that provides either, but not both, of the other 
interfaces at a time.





Re: Mir Random [WIP]

2016-11-23 Thread Kagamin via Digitalmars-d
On Wednesday, 23 November 2016 at 13:41:25 UTC, Andrei 
Alexandrescu wrote:

An input range can be made noncopyable.

If you implement it as a noncopyable input range, you get a 
large support network working for you.


struct A
{
bool empty;
int front;
void popFront(){}
@disable this(this);
}

import std.algorithm;
void f()
{
A r;
auto m=r.map!(a=>1);
}

Does this compile for you?


Re: Mir Random [WIP]

2016-11-23 Thread Kagamin via Digitalmars-d
On Wednesday, 23 November 2016 at 14:30:53 UTC, Andrei 
Alexandrescu wrote:
This claim would apply to all ranges. There seems to be 
evidence it is unfounded.


The main argument for using the range interface for RNGs is 
reuse of abstraction. Minute implementation matters are much 
less important. The main counter-argument is that the 
abstraction is not fitting well and another abstraction (such 
as opCall) is more advantageous.


Consider this okayish looking code:
consume(rng());
consume(rng.take(2)); //reuses previous value
consume(rng()); //discards unused value


Re: Mir Random [WIP]

2016-11-23 Thread Andrei Alexandrescu via Digitalmars-d

On 11/23/2016 09:12 AM, Kagamin wrote:

xorshift128+ returns a temporary value computed during state update, so
it can't efficiently separate font from popFront.


That seems to be a minor matter. Of course at best some measurements 
would be in order.



Also this API is prone
to reuse some values and discard others, call popFront after front.


This claim would apply to all ranges. There seems to be evidence it is 
unfounded.


The main argument for using the range interface for RNGs is reuse of 
abstraction. Minute implementation matters are much less important. The 
main counter-argument is that the abstraction is not fitting well and 
another abstraction (such as opCall) is more advantageous.



Andrei



Re: Mir Random [WIP]

2016-11-23 Thread Kagamin via Digitalmars-d
On Wednesday, 23 November 2016 at 01:34:23 UTC, Andrei 
Alexandrescu wrote:
We should add a reference RNG that transforms any other RNG 
into an input range that shares the actual RNG.


Didn't we have a generic Ref!T wrapper in std.typecons that would 
work this way? Can't find it now.


Re: Mir Random [WIP]

2016-11-23 Thread Kagamin via Digitalmars-d
On Wednesday, 23 November 2016 at 10:57:04 UTC, Joseph Rushton 
Wakeling wrote:

struct MyRNG
{
void popFront() { /* update internal state */ }

UIntType front() @property { return /* whatever part of 
internal state */; }


UIntType opCall()
{
this.popFront();
return this.front;
}
}


xorshift128+ returns a temporary value computed during state 
update, so it can't efficiently separate font from popFront. Also 
this API is prone to reuse some values and discard others, call 
popFront after front.


Re: Mir Random [WIP]

2016-11-23 Thread Joseph Rushton Wakeling via Digitalmars-d
On Wednesday, 23 November 2016 at 13:56:27 UTC, Andrei 
Alexandrescu wrote:
Well I do see another small problem with 
https://github.com/libmir/mir-random/blob/master/source/random/algorithm.d#L19. It creates the first value eagerly upon construction. That's just a bit awkward. It's not the first time I'm seeing this issue, and to be honest am a bit bummed about it. It's a consequence of all ranges needing to buffer at least one element. -- Andrei


Yea, this touches on one of my more general concerns related to 
ranges, randomness and laziness.  Part of the trouble is we don't 
have a really good general design for how to implement _truly_ 
lazy ranges, where the value of `front` is not determined until 
the point where you first access it.


This has particular relevance to e.g. RandomSample and 
RandomCover.


Re: Mir Random [WIP]

2016-11-23 Thread Andrei Alexandrescu via Digitalmars-d

On 11/23/2016 08:26 AM, Joseph Rushton Wakeling wrote:

On Wednesday, 23 November 2016 at 13:03:04 UTC, Ilya Yaroshenko wrote:

Added RandomRangeAdaptor for URBGs:
https://github.com/libmir/mir-random/blob/master/source/random/algorithm.d



This has exactly the problem I identified above, though: you're
unnecessarily cacheing the latest variate rather than just using the RNG
state directly.  Not the biggest deal in the world, but avoidable if you
allow a separation between updating RNG state and accessing it.


Well I do see another small problem with 
https://github.com/libmir/mir-random/blob/master/source/random/algorithm.d#L19. 
It creates the first value eagerly upon construction. That's just a bit 
awkward. It's not the first time I'm seeing this issue, and to be honest 
am a bit bummed about it. It's a consequence of all ranges needing to 
buffer at least one element. -- Andrei


Re: Mir Random [WIP]

2016-11-23 Thread Joseph Rushton Wakeling via Digitalmars-d
On Wednesday, 23 November 2016 at 13:44:00 UTC, Andrei 
Alexandrescu wrote:
Well we need to get to the bottom of this if we're to make 
progress. Otherwise it's copypasta with little changes followed 
by disappointment all the way down. -- Andrei


Would you be up for a direct discussion of this some time -- 
maybe next week once I've had time to re-review the fine details 
of my concerns?  I can provide a summary of issues as a starting 
point for that discussion, if it would be useful.


I suggest this because (i) I'm interested in your insight and 
(ii) even if we end up disagreeing on concerns and solutions, it 
would be good to make sure that we're on the same page in terms 
of understanding each other's point of view.


Re: Mir Random [WIP]

2016-11-23 Thread Andrei Alexandrescu via Digitalmars-d

On 11/23/2016 06:14 AM, Ilya Yaroshenko wrote:

I think that Range API is bad and useless overengineering for RNGs.


So it either "takes 1 minute" to change the interface from opCall to 
ranges (i.e. they're virtually the same thing), or it's the above. 
There's precious little overengineering that can be done in 1 minute 
:o). I see you did that in a dozen lines in RandomRangeAdaptor.


I understand you believe the range interface is unnecessary or overkill 
for random number generators. I may even agree for certain cases. 
However, there are a few arguments you may want to consider:


* By virtue of working with D, everybody know what a range is. 
Presenting the RNGs that way instantly evokes a host of knowledge, 
understanding, and idioms.


* D users (or at least a fraction) and several algorithms understand the 
notion of an infinite range and make salient decisions based on that. A 
range interface automatically taps into that.



Andrei


Re: Mir Random [WIP]

2016-11-23 Thread Joseph Rushton Wakeling via Digitalmars-d
On Wednesday, 23 November 2016 at 13:35:33 UTC, Ilya Yaroshenko 
wrote:
1. Current default RNG (Mt19937) has not state for the latest 
value.


That's a detail of your implementation, though, and it's not true 
for lots of other pseudo-RNGs.


2. The structure is allocated on stack and compilers can 
recognise loop patterns and eliminate addition memory movements 
for many cases.


Fair enough.  I'm still not sure, though, why you would object to 
providing public methods for pseudo-RNGs to (i) update the 
internal state and (ii) access the most recently generated 
variate, which the `opCall` would then wrap.


Re: Mir Random [WIP]

2016-11-23 Thread Andrei Alexandrescu via Digitalmars-d

On 11/23/2016 05:47 AM, Joseph Rushton Wakeling wrote:

On Wednesday, 23 November 2016 at 01:34:23 UTC, Andrei Alexandrescu wrote:

I'm unclear on what that statistically unsafe default behavior is - my
understanding is it has to do with RNGs being inadvertently copied. It
would be great to formalize that in a well-explained issue.


I'll see if I can write that up in depth some time soon.  TBH though I
think the problem is less about RNGs and more about stuff like
RandomSample and RandomCover (and, in future, random distributions that
have internal state, like a struct implementing a normal distribution
using the Ziggurat algorithm internally).

It's not so difficult to stop RNG state being copied inadvertently, but
when you have ranges wrapping ranges wrapping ranges, each containing
their own extra state that cannot be copied by value, things get a bit
more complicated.


Well we need to get to the bottom of this if we're to make progress. 
Otherwise it's copypasta with little changes followed by disappointment 
all the way down. -- Andrei


Re: Mir Random [WIP]

2016-11-23 Thread Andrei Alexandrescu via Digitalmars-d

On 11/23/2016 01:16 AM, Ilya Yaroshenko wrote:

The std.range is good example of degradation, it has a lot of API and
implementation  bugs.



EDIT: std.range -> std.random


Oh, that narrows it down. Thx. -- Andrei



Re: Mir Random [WIP]

2016-11-23 Thread Andrei Alexandrescu via Digitalmars-d

On 11/23/2016 05:27 AM, Joseph Rushton Wakeling wrote:


Alternative solution: functionality that expects the full unsigned
integer word (UIntType) to be filled with random bits, should validate
that the min/max values of the generator correspond to UIntType.min and
UIntType.max.

That introduces less breaking change, creates less divergence with the
C++11 standard, and preserves more flexibility for the future.


Good idea - static assert is your friend. -- Andrei


Re: Mir Random [WIP]

2016-11-23 Thread Andrei Alexandrescu via Digitalmars-d

On 11/23/2016 12:58 AM, Ilya Yaroshenko wrote:

On Tuesday, 22 November 2016 at 23:55:01 UTC, Andrei Alexandrescu wrote:

On 11/22/16 1:31 AM, Ilya Yaroshenko wrote:

 - `opCall` API instead of range interface is used (similar to C++)


This seems like a gratuitous departure from common D practice. Random
number generators are most naturally modeled in D as infinite ranges.
-- Andrei


It is safe low level architecture without performance and API issues.


I don't understand this. Can you please be more specific? I don't see a 
major issue wrt offering opCall() vs. front/popFront. (empty is always 
true.)



It
prevents users to do stupid things implicitly (like copying RNGs).


An input range can be made noncopyable.


A
hight level range interface can be added in the future (it will hold a
_pointer_ to an RNG).


Is there a reason to not have that now? Again, I'm literally talking 
about offering front/popFront in lieu of opCall(). The only 
implementation difference is you keep the currently-generated number as 
a member instead of returning it from opCall. I doubt one could measure 
a performance difference.


If you implement it as a noncopyable input range, you get a large 
support network working for you. With opCall, we have virtually no such 
support - you need to do everything once again.



In additional, when you need to write algorithms
or distributions opCall is much more convenient than range API.


Could you please be more specific? On the face of it I'd agree one call 
is less than two, but I don't see a major drawback here.



In
additions, users would not use Engine API in 99% cases: they will just
want to call `rand` or `uniform`, or other distribution.

I am sure that almost any library should have low level API that is fits
to its implementation first. Addition API levels also may be added.


Is there a large difference between opCall and front/popFront?

Actually I can think of one - the matter of getting things started. 
Ranges have this awkwardness of starting the iteration: either you fill 
the current front eagerly in the constructor, or you have some sort of 
means to detect initialization has not yet been done and do it lazily 
upon the first use of front. The best strategy would depend on the 
actual generator, and admittedly would be a bit more of a headache 
compared to opCall. Was this the motivation?



Current Phobos evolution is generic degradation: more generic and
"universal" code hide more uncovered bugs in the code. The std.range is
good example of degradation, it has a lot of API and implementation  bugs.


Do you have examples of issues outside random number generators?


### Example of API+implementation bug:

 Bug: RNGs has min and max params (hello C++). But, they are not
used when an uniform integer number is generated : `uniform!ulong` /
`uniform!ulong(0, 100)`.

 Solution: In Mir Rundom any RNGs must generate all 8/16/32/64 bits
uniformly. It is RNG problem how to do it.


Min and max are not parameters, they are bounds provided by each 
generator. I agree their purpose is unclear. We could require all 
generators to provide min = 0 and max = UIntType.max without breaking 
APIs. In that case we only need to renounce LinearCongruentialEngine 
with c = 0 (see 
https://github.com/dlang/phobos/blob/master/std/random.d#L258) - in fact 
that's the main reason for introducing min and max in the first place. 
All other code stays unchanged, and we can easily deprecate min and max 
for RNGs.


(I do see min and max used by uniform at 
https://github.com/dlang/phobos/blob/master/std/random.d#L1281 so I'm 
not sure I get what you mean, but anyhow the idea that we require RNGs 
to fill an uint/ulong with all random bits simplifies a lot of matters.)



I will not fill this bug as well another dozen std.random bugs because
the module should be rewritten anyway and I am working on it. std.random
is a collection of bugs from C/C++ libraries extended with D generic
idioms. For example, there is no reason in 64 bit Xorshift. It is 32 bit
by design. Furthermore, 64 expansion of 32 bit algorithms must be proved
theoretically before we allow it for end users. 64 bit analogs are
exists, but they have another implementations.


One matter that I see is there's precious little difference between 
mir.random and std.random. Much of the code seems copied, which is an 
inefficient way to go about things. We shouldn't fork everything if we 
don't like a bit of it, though admittedly the path toward making changes 
in std is more difficult. Is your intent to work on mir.random on the 
side and then submit it as a wholesale replacement of std.random under a 
different name? In that case you'd have my support, but you'd need to 
convince me the replacement is necessary. You'd probably have a good 
case for eliminating xorshift/64, but then we may simply deprecate that 
outright. You'd possibly have a more difficult time with opCall.



Phobos degrades because
we add a lot of 

Re: Mir Random [WIP]

2016-11-23 Thread Ilya Yaroshenko via Digitalmars-d
On Wednesday, 23 November 2016 at 13:01:22 UTC, Andrea Fontana 
wrote:
On Tuesday, 22 November 2016 at 06:31:45 UTC, Ilya Yaroshenko 
wrote:

 - 64-bit Mt19937 initialization is fixed
 - 64-bit Mt19937 is default for 64-bit targets


I wonder why Mersenne Twister is *always* choosen over other 
algorithms.

My vote goes for CMWC, of course.

Andrea


A PR for CMWC is highly welcome!


Re: Mir Random [WIP]

2016-11-23 Thread Ilya Yaroshenko via Digitalmars-d
On Wednesday, 23 November 2016 at 13:26:41 UTC, Joseph Rushton 
Wakeling wrote:
On Wednesday, 23 November 2016 at 13:03:04 UTC, Ilya Yaroshenko 
wrote:

Added RandomRangeAdaptor for URBGs:
https://github.com/libmir/mir-random/blob/master/source/random/algorithm.d


This has exactly the problem I identified above, though: you're 
unnecessarily cacheing the latest variate rather than just 
using the RNG state directly.  Not the biggest deal in the 
world, but avoidable if you allow a separation between updating 
RNG state and accessing it.


1. Current default RNG (Mt19937) has not state for the latest 
value.
2. The structure is allocated on stack and compilers can 
recognise loop patterns and eliminate addition memory movements 
for many cases.


Re: Mir Random [WIP]

2016-11-23 Thread Joseph Rushton Wakeling via Digitalmars-d
On Wednesday, 23 November 2016 at 13:01:22 UTC, Andrea Fontana 
wrote:
I wonder why Mersenne Twister is *always* choosen over other 
algorithms.


The weight of history, I suspect.  Mersenne Twister was the major 
new high-quality RNG back when people started getting really 
concerned about having good defaults, and when the Diehard Tests 
were the state of the art in tests of randomness.  IIRC there's 
also a benefit in terms of dimensionality, which some more recent 
generators don't address, which can make it a safer "all-round 
default".


Agree that there are probably better choices for today, though.


Re: Mir Random [WIP]

2016-11-23 Thread Joseph Rushton Wakeling via Digitalmars-d
On Wednesday, 23 November 2016 at 13:03:04 UTC, Ilya Yaroshenko 
wrote:

Added RandomRangeAdaptor for URBGs:
https://github.com/libmir/mir-random/blob/master/source/random/algorithm.d


This has exactly the problem I identified above, though: you're 
unnecessarily cacheing the latest variate rather than just using 
the RNG state directly.  Not the biggest deal in the world, but 
avoidable if you allow a separation between updating RNG state 
and accessing it.


Re: Mir Random [WIP]

2016-11-23 Thread Ilya Yaroshenko via Digitalmars-d
On Wednesday, 23 November 2016 at 11:44:44 UTC, Joseph Rushton 
Wakeling wrote:
Yes, most uses of RNGs in std.random involve calling `front` 
and then `popFront()` (although it would probably be better the 
other way round).  But it's readily possible to imagine 
range-based use-cases for random distributions along the lines 
of,


myRNG.normalDistribution(0.0, 5.0).filter!(a => a > 
0).somethingElse.take(20);


But what I'd say more broadly is that of what I've seen so far, 
mir.random is conflating too many breaking changes without 
giving thought for their impact (for example, converting the 
`isUniformRNG` check to rely on a UDA is IMO problematic; I can 
file a GitHub issue explaining the reasons for this).  Allowing 
for the wider goals of the exercise, it could be worth giving 
some thought to which of these breakages is really needed to 
support your use-cases, and which can be avoided.


Added RandomRangeAdaptor for URBGs:
https://github.com/libmir/mir-random/blob/master/source/random/algorithm.d


Re: Mir Random [WIP]

2016-11-23 Thread Andrea Fontana via Digitalmars-d
On Tuesday, 22 November 2016 at 06:31:45 UTC, Ilya Yaroshenko 
wrote:

 - 64-bit Mt19937 initialization is fixed
 - 64-bit Mt19937 is default for 64-bit targets


I wonder why Mersenne Twister is *always* choosen over other 
algorithms.

My vote goes for CMWC, of course.

Andrea



Re: Mir Random [WIP]

2016-11-23 Thread Ilya Yaroshenko via Digitalmars-d
On Wednesday, 23 November 2016 at 11:44:44 UTC, Joseph Rushton 
Wakeling wrote:
On Wednesday, 23 November 2016 at 11:14:38 UTC, Ilya Yaroshenko 
wrote:
For example, Mir Random basic utilities (more low level then 
distributions):

https://github.com/libmir/mir-random/blob/master/source/random/package.d

Also you can explore std.random code. opCall would almost 
always more convenient to use.


Yes, but as described, `opCall` can easily be created as a 
composition of `popFront` and `front` for these convenience 
purposes.



We don't restrict. It is 1 minute to write an Range wrapper.


If all you have is `opCall` then the range wrapper is less 
efficient than it need be.


Inlining will solve this problem with data duplication (if any) 
for most real world cases.




This wrapper can be added to the library, if we found a real 
world use case. Again, I have not seen a real world algorithm, 
which looks better with Range API for generators. RandomSample 
can be created without Range API, and it would look more 
convenient then it is now. I think that Range API is bad and 
useless overengineering for RNGs.


Yes, most uses of RNGs in std.random involve calling `front` 
and then `popFront()` (although it would probably be better the 
other way round).  But it's readily possible to imagine 
range-based use-cases for random distributions along the lines 
of,


myRNG.normalDistribution(0.0, 5.0).filter!(a => a > 
0).somethingElse.take(20);


But what I'd say more broadly is that of what I've seen so far, 
mir.random is conflating too many breaking changes without 
giving thought for their impact (for example, converting the 
`isUniformRNG` check to rely on a UDA is IMO problematic; I can 
file a GitHub issue explaining the reasons for this).  Allowing 
for the wider goals of the exercise, it could be worth giving 
some thought to which of these breakages is really needed to 
support your use-cases, and which can be avoided.


Yes, please file a GitHub issue.


Re: Mir Random [WIP]

2016-11-23 Thread Joseph Rushton Wakeling via Digitalmars-d
On Wednesday, 23 November 2016 at 11:14:38 UTC, Ilya Yaroshenko 
wrote:
For example, Mir Random basic utilities (more low level then 
distributions):

https://github.com/libmir/mir-random/blob/master/source/random/package.d

Also you can explore std.random code. opCall would almost 
always more convenient to use.


Yes, but as described, `opCall` can easily be created as a 
composition of `popFront` and `front` for these convenience 
purposes.



We don't restrict. It is 1 minute to write an Range wrapper.


If all you have is `opCall` then the range wrapper is less 
efficient than it need be.


This wrapper can be added to the library, if we found a real 
world use case. Again, I have not seen a real world algorithm, 
which looks better with Range API for generators. RandomSample 
can be created without Range API, and it would look more 
convenient then it is now. I think that Range API is bad and 
useless overengineering for RNGs.


Yes, most uses of RNGs in std.random involve calling `front` and 
then `popFront()` (although it would probably be better the other 
way round).  But it's readily possible to imagine range-based 
use-cases for random distributions along the lines of,


myRNG.normalDistribution(0.0, 5.0).filter!(a => a > 
0).somethingElse.take(20);


But what I'd say more broadly is that of what I've seen so far, 
mir.random is conflating too many breaking changes without giving 
thought for their impact (for example, converting the 
`isUniformRNG` check to rely on a UDA is IMO problematic; I can 
file a GitHub issue explaining the reasons for this).  Allowing 
for the wider goals of the exercise, it could be worth giving 
some thought to which of these breakages is really needed to 
support your use-cases, and which can be avoided.


Re: Mir Random [WIP]

2016-11-23 Thread Ilya Yaroshenko via Digitalmars-d
On Wednesday, 23 November 2016 at 10:57:04 UTC, Joseph Rushton 
Wakeling wrote:
On Wednesday, 23 November 2016 at 05:58:47 UTC, Ilya Yaroshenko 
wrote:
It is safe low level architecture without performance and API 
issues. It prevents users to do stupid things implicitly (like 
copying RNGs). A hight level range interface can be added in 
the future (it will hold a _pointer_ to an RNG).


Note that if you want to do this, it's convenient to still 
preserve a separation between popping the RNG state versus 
getting the current variate.  Otherwise, the range interface 
will wind up having to separately cache the variate value, 
which is wasteful.


Something like this:

struct MyRNG
{
void popFront() { /* update internal state */ }

UIntType front() @property { return /* whatever part of 
internal state */; }


UIntType opCall()
{
this.popFront();
return this.front;
}
}

(The above is basically just the input range API less the 
`empty` property, and the `front` and `popFront()` are arguably 
a lower-level API than `opCall`.)


In additional, when you need to write algorithms or 
distributions opCall is much more convenient than range API.


Can you give some examples of what you mean here?


For example, Mir Random basic utilities (more low level then 
distributions):

https://github.com/libmir/mir-random/blob/master/source/random/package.d

Also you can explore std.random code. opCall would almost always 
more convenient to use.


 In additions, users would not use Engine API in 99% cases: 
they will just want to call `rand` or `uniform`, or other 
distribution.


I don't think that's necessarily true, but in any case, we 
shouldn't restrict the use-cases of the 1% unless we have to.


We don't restrict. It is 1 minute to write an Range wrapper. This 
wrapper can be added to the library, if we found a real world use 
case. Again, I have not seen a real world algorithm, which looks 
better with Range API for generators. RandomSample can be created 
without Range API, and it would look more convenient then it is 
now. I think that Range API is bad and useless overengineering 
for RNGs.


Re: Mir Random [WIP]

2016-11-23 Thread Joseph Rushton Wakeling via Digitalmars-d
On Wednesday, 23 November 2016 at 11:03:33 UTC, Ilya Yaroshenko 
wrote:
It is only question of `Random` alias, which can be changed in 
the future anyway. Both Mt19937_32 and Mt19937_64 are defined.


I think we're at an impasse in terms of priorities, because 
that's exactly the reason that I think you should leave the 
Random alias pointing to the same generator, and let the people 
with speed/quality concerns select the alternative generator ;-)


Re: Mir Random [WIP]

2016-11-23 Thread Ilya Yaroshenko via Digitalmars-d
On Wednesday, 23 November 2016 at 10:33:21 UTC, Joseph Rushton 
Wakeling wrote:
On Wednesday, 23 November 2016 at 05:26:12 UTC, Ilya Yaroshenko 
wrote:

[...]


All of which is fine in its own terms, but why prioritize the 
speed of the default behaviour over its reliability and 
reproducibility?


Anyone who cares about that combination of speed and 
statistical quality will have enough information in the 
documentation to know what to do.  By contrast producing 
different results for identical user code depending on whether 
you're making a 32- or 64-bit build is an unpleasant 
complication it could be better to avoid.


In any case, given what you say above, shouldn't the choice of 
32- vs. 64-bit RNG depend on whether one is using a 
distribution that generates 32- vs. 64-bit floating-point 
variates, rather than whether one is building for a 32- or 
64-bit target?  In which case it needs to be a user choice, 
since it's only the user who knows what distribution they're 
using.


We have a Random alias. I think it is OK if it generates 
different numbers for different arch and library versions. If a 
user want to exact the same behaviour he can use explicit names 
like Mt19937_32 and Mt19937_64. Both are defined for all 
architectures.


64-bit has not significant speed degradation on 64-bit machines 
for 32-bit random number generation. Maybe only few %. So it is 
better for 64-bit machines.


It is only question of `Random` alias, which can be changed in 
the future anyway. Both Mt19937_32 and Mt19937_64 are defined.


Re: Mir Random [WIP]

2016-11-23 Thread Joseph Rushton Wakeling via Digitalmars-d
On Wednesday, 23 November 2016 at 05:58:47 UTC, Ilya Yaroshenko 
wrote:
It is safe low level architecture without performance and API 
issues. It prevents users to do stupid things implicitly (like 
copying RNGs). A hight level range interface can be added in 
the future (it will hold a _pointer_ to an RNG).


Note that if you want to do this, it's convenient to still 
preserve a separation between popping the RNG state versus 
getting the current variate.  Otherwise, the range interface will 
wind up having to separately cache the variate value, which is 
wasteful.


Something like this:

struct MyRNG
{
void popFront() { /* update internal state */ }

UIntType front() @property { return /* whatever part of 
internal state */; }


UIntType opCall()
{
this.popFront();
return this.front;
}
}

(The above is basically just the input range API less the `empty` 
property, and the `front` and `popFront()` are arguably a 
lower-level API than `opCall`.)


In additional, when you need to write algorithms or 
distributions opCall is much more convenient than range API.


Can you give some examples of what you mean here?

 In additions, users would not use Engine API in 99% cases: 
they will just want to call `rand` or `uniform`, or other 
distribution.


I don't think that's necessarily true, but in any case, we 
shouldn't restrict the use-cases of the 1% unless we have to.


Re: Mir Random [WIP]

2016-11-23 Thread Joseph Rushton Wakeling via Digitalmars-d
On Wednesday, 23 November 2016 at 01:34:23 UTC, Andrei 
Alexandrescu wrote:
I'm unclear on what that statistically unsafe default behavior 
is - my understanding is it has to do with RNGs being 
inadvertently copied. It would be great to formalize that in a 
well-explained issue.


I'll see if I can write that up in depth some time soon.  TBH 
though I think the problem is less about RNGs and more about 
stuff like RandomSample and RandomCover (and, in future, random 
distributions that have internal state, like a struct 
implementing a normal distribution using the Ziggurat algorithm 
internally).


It's not so difficult to stop RNG state being copied 
inadvertently, but when you have ranges wrapping ranges wrapping 
ranges, each containing their own extra state that cannot be 
copied by value, things get a bit more complicated.


Re: Mir Random [WIP]

2016-11-23 Thread Joseph Rushton Wakeling via Digitalmars-d
On Wednesday, 23 November 2016 at 05:26:12 UTC, Ilya Yaroshenko 
wrote:
Mir Random is going to be a library with saturated uniform FP 
RNGs and almost saturated exponential FP RNGs. Comparing with 
all other libraries (any language) the basic uniform FP numbers 
will be generated in interval (-1, +1) and contains _all_ 
possible values including all subnormal numbers. 64 bit 
generators are 2 times faster for this task if you need to 
generate a 64 bit floating point number. Explanation of 
technique will be in my post/article. --Ilya


All of which is fine in its own terms, but why prioritize the 
speed of the default behaviour over its reliability and 
reproducibility?


Anyone who cares about that combination of speed and statistical 
quality will have enough information in the documentation to know 
what to do.  By contrast producing different results for 
identical user code depending on whether you're making a 32- or 
64-bit build is an unpleasant complication it could be better to 
avoid.


In any case, given what you say above, shouldn't the choice of 
32- vs. 64-bit RNG depend on whether one is using a distribution 
that generates 32- vs. 64-bit floating-point variates, rather 
than whether one is building for a 32- or 64-bit target?  In 
which case it needs to be a user choice, since it's only the user 
who knows what distribution they're using.


Re: Mir Random [WIP]

2016-11-23 Thread Ilya Yaroshenko via Digitalmars-d
On Wednesday, 23 November 2016 at 10:27:00 UTC, Joseph Rushton 
Wakeling wrote:
On Wednesday, 23 November 2016 at 05:58:47 UTC, Ilya Yaroshenko 
wrote:

### Example of API+implementation bug:

 Bug: RNGs has min and max params (hello C++). But, they 
are not used when an uniform integer number is generated : 
`uniform!ulong` / `uniform!ulong(0, 100)`.


 Solution: In Mir Rundom any RNGs must generate all 
8/16/32/64 bits uniformly. It is RNG problem how to do it.


Alternative solution: functionality that expects the full 
unsigned integer word (UIntType) to be filled with random bits, 
should validate that the min/max values of the generator 
correspond to UIntType.min and UIntType.max.


That introduces less breaking change, creates less divergence 
with the C++11 standard, and preserves more flexibility for the 
future.


Good point, will add this. --Ilya


Re: Mir Random [WIP]

2016-11-23 Thread Joseph Rushton Wakeling via Digitalmars-d
On Wednesday, 23 November 2016 at 05:58:47 UTC, Ilya Yaroshenko 
wrote:

### Example of API+implementation bug:

 Bug: RNGs has min and max params (hello C++). But, they 
are not used when an uniform integer number is generated : 
`uniform!ulong` / `uniform!ulong(0, 100)`.


 Solution: In Mir Rundom any RNGs must generate all 
8/16/32/64 bits uniformly. It is RNG problem how to do it.


Alternative solution: functionality that expects the full 
unsigned integer word (UIntType) to be filled with random bits, 
should validate that the min/max values of the generator 
correspond to UIntType.min and UIntType.max.


That introduces less breaking change, creates less divergence 
with the C++11 standard, and preserves more flexibility for the 
future.


Re: Mir Random [WIP]

2016-11-22 Thread Ilya Yaroshenko via Digitalmars-d
On Wednesday, 23 November 2016 at 05:58:47 UTC, Ilya Yaroshenko 
wrote:
On Tuesday, 22 November 2016 at 23:55:01 UTC, Andrei 
Alexandrescu wrote:

On 11/22/16 1:31 AM, Ilya Yaroshenko wrote:
 - `opCall` API instead of range interface is used (similar 
to C++)


This seems like a gratuitous departure from common D practice. 
Random number generators are most naturally modeled in D as 
infinite ranges. -- Andrei


It is safe low level architecture without performance and API 
issues. It prevents users to do stupid things implicitly (like 
copying RNGs). A hight level range interface can be added in 
the future (it will hold a _pointer_ to an RNG). In additional, 
when you need to write algorithms or distributions opCall is 
much more convenient than range API. In additions, users would 
not use Engine API in 99% cases: they will just want to call 
`rand` or `uniform`, or other distribution.


I am sure that almost any library should have low level API 
that is fits to its implementation first. Addition API levels 
also may be added. Current Phobos evolution is generic 
degradation: more generic and "universal" code hide more 
uncovered bugs in the code. The std.range is good example of 
degradation, it has a lot of API and implementation  bugs.




EDIT: std.range -> std.random




Re: Mir Random [WIP]

2016-11-22 Thread Ilya Yaroshenko via Digitalmars-d
On Tuesday, 22 November 2016 at 23:55:01 UTC, Andrei Alexandrescu 
wrote:

On 11/22/16 1:31 AM, Ilya Yaroshenko wrote:
 - `opCall` API instead of range interface is used (similar to 
C++)


This seems like a gratuitous departure from common D practice. 
Random number generators are most naturally modeled in D as 
infinite ranges. -- Andrei


It is safe low level architecture without performance and API 
issues. It prevents users to do stupid things implicitly (like 
copying RNGs). A hight level range interface can be added in the 
future (it will hold a _pointer_ to an RNG). In additional, when 
you need to write algorithms or distributions opCall is much more 
convenient than range API. In additions, users would not use 
Engine API in 99% cases: they will just want to call `rand` or 
`uniform`, or other distribution.


I am sure that almost any library should have low level API that 
is fits to its implementation first. Addition API levels also may 
be added. Current Phobos evolution is generic degradation: more 
generic and "universal" code hide more uncovered bugs in the 
code. The std.range is good example of degradation, it has a lot 
of API and implementation  bugs.


### Example of API+implementation bug:

 Bug: RNGs has min and max params (hello C++). But, they are 
not used when an uniform integer number is generated : 
`uniform!ulong` / `uniform!ulong(0, 100)`.


 Solution: In Mir Rundom any RNGs must generate all 
8/16/32/64 bits uniformly. It is RNG problem how to do it.


I will not fill this bug as well another dozen std.random bugs 
because the module should be rewritten anyway and I am working on 
it. std.random is a collection of bugs from C/C++ libraries 
extended with D generic idioms. For example, there is no reason 
in 64 bit Xorshift. It is 32 bit by design. Furthermore, 64 
expansion of 32 bit algorithms must be proved theoretically 
before we allow it for end users. 64 bit analogs are exists, but 
they have another implementations. Phobos degrades because we add 
a lot of generic specializations and small utilities without 
understanding use cases. Phobos really follows stupid idealistic 
idea: more generic is better, more API is better, more universal 
algorithms is better. The problems is that Phobos/DRuntime is 
soup where all (because its "universality") interacts with 
everything.


Re: Mir Random [WIP]

2016-11-22 Thread Ilya Yaroshenko via Digitalmars-d
On Wednesday, 23 November 2016 at 00:44:26 UTC, Joseph Rushton 
Wakeling wrote:
On Tuesday, 22 November 2016 at 06:31:45 UTC, Ilya Yaroshenko 
wrote:

 - 64-bit Mt19937 is default for 64-bit targets


This means that seemingly identical code will produce different 
results depending on whether it's compiled for 64-bit or 
32-bit.  Is that really worth it, when anyone who cares about 
the difference between 32-bit vs. 64-bit random words is quite 
capable of specifying the RNG they want to use and not just 
relying on the default?


Having a different default RNG would make sense for targets 
where there are serious performance issues at stake (e.g. 
minimal memory available for RNG state) but just for the sake 
of 32- vs. 64-bit Mersenne Twister seems an unnecessary 
complication.


These days it's debatable whether Mersenne Twister of _any_ 
word size is the optimal choice for a default RNG, so if the 
default is going to be changed, it might as well be to 
something significantly better (rather than worrying about 
numbers of bits).


Mir Random is going to be a library with saturated uniform FP 
RNGs and almost saturated exponential FP RNGs. Comparing with all 
other libraries (any language) the basic uniform FP numbers will 
be generated in interval (-1, +1) and contains _all_ possible 
values including all subnormal numbers. 64 bit generators are 2 
times faster for this task if you need to generate a 64 bit 
floating point number. Explanation of technique will be in my 
post/article. --Ilya


Re: Mir Random [WIP]

2016-11-22 Thread ketmar via Digitalmars-d
On Wednesday, 23 November 2016 at 01:28:11 UTC, Andrei 
Alexandrescu wrote:

On 11/22/16 7:44 PM, Joseph Rushton Wakeling wrote:
These days it's debatable whether Mersenne Twister of _any_ 
word size is

the optimal choice for a default RNG


Interesting. Could you please add a couple of links about that? 
-- Andrei


http://www.pcg-random.org/other-rngs.html

;-)


Re: Mir Random [WIP]

2016-11-22 Thread Andrei Alexandrescu via Digitalmars-d

On 11/22/16 7:30 PM, John Colvin wrote:

On Tuesday, 22 November 2016 at 23:55:01 UTC, Andrei Alexandrescu wrote:

On 11/22/16 1:31 AM, Ilya Yaroshenko wrote:

 - `opCall` API instead of range interface is used (similar to C++)


This seems like a gratuitous departure from common D practice. Random
number generators are most naturally modeled in D as infinite ranges.
-- Andrei


I'm pretty sure everyone *wants* it to be a range, but it turns out it's
a difficult design space. Lot's of nasty trade-offs that can be (and
have been) argued back and forth depending on your priorities.


The proposed design has disabled copy ctor and uses opCall() for a new 
element. That seems to be a difference without a distinction from an 
input range that has disabled copy ctor and offers the input range 
primitives.



Perhaps you have an insight that has been missed that can make a good
rng range without causing less than optimal performance or statistically
unsafe default behaviour?


We should add a reference RNG that transforms any other RNG into an 
input range that shares the actual RNG.



IIRC you think people are making too much fuss about the statistically
safe default behaviour, but it would be interesting to hear a more
expanded version of that view.


I'm unclear on what that statistically unsafe default behavior is - my 
understanding is it has to do with RNGs being inadvertently copied. It 
would be great to formalize that in a well-explained issue.



Andrei



Re: Mir Random [WIP]

2016-11-22 Thread Andrei Alexandrescu via Digitalmars-d

On 11/22/16 7:36 PM, Joseph Rushton Wakeling wrote:


  * random _generators_, i.e. sources of uniformly distributed random bits:

- random _engines_ (seedable, pseudo-random algorithms)

- random _devices_ (non-deterministic sources of uniformly
distributed bits)

  * random _distributions_, which transform uniformly-distributed random
bits
into variates with some other type and distribution

- note _this includes uniformly-distributed integers_!

- also uniformly-distributed floats, enums, etc.

- and also non-uniform distributions

  * random _algorithms_, i.e. algorithms in the sense of std.random, but
where
their popFront() includes random decision-making

- randomCover, randomSample, etc.


Yah, I think that would be nice. -- Andrei



Re: Mir Random [WIP]

2016-11-22 Thread Andrei Alexandrescu via Digitalmars-d

On 11/22/16 7:44 PM, Joseph Rushton Wakeling wrote:

These days it's debatable whether Mersenne Twister of _any_ word size is
the optimal choice for a default RNG


Interesting. Could you please add a couple of links about that? -- Andrei



Re: Mir Random [WIP]

2016-11-22 Thread Joseph Rushton Wakeling via Digitalmars-d
On Tuesday, 22 November 2016 at 06:31:45 UTC, Ilya Yaroshenko 
wrote:

 - 64-bit Mt19937 is default for 64-bit targets


This means that seemingly identical code will produce different 
results depending on whether it's compiled for 64-bit or 32-bit.  
Is that really worth it, when anyone who cares about the 
difference between 32-bit vs. 64-bit random words is quite 
capable of specifying the RNG they want to use and not just 
relying on the default?


Having a different default RNG would make sense for targets where 
there are serious performance issues at stake (e.g. minimal 
memory available for RNG state) but just for the sake of 32- vs. 
64-bit Mersenne Twister seems an unnecessary complication.


These days it's debatable whether Mersenne Twister of _any_ word 
size is the optimal choice for a default RNG, so if the default 
is going to be changed, it might as well be to something 
significantly better (rather than worrying about numbers of bits).


Re: Mir Random [WIP]

2016-11-22 Thread Joseph Rushton Wakeling via Digitalmars-d
On Tuesday, 22 November 2016 at 06:31:45 UTC, Ilya Yaroshenko 
wrote:

# Integer uniform generators
[WIP]

# Real uniform generators
[WIP]

# Nonuniform generators
[WIP]


As we discussed in relation to Seb's project, I think this is a 
problematic conceptualization of the best way to structure 
functionality related to randomness.


An arguably better way (as outlined in the C++11 standard) is to 
think in terms of:


  * random _generators_, i.e. sources of uniformly distributed 
random bits:


- random _engines_ (seedable, pseudo-random algorithms)

- random _devices_ (non-deterministic sources of uniformly 
distributed bits)


  * random _distributions_, which transform uniformly-distributed 
random bits

into variates with some other type and distribution

- note _this includes uniformly-distributed integers_!

- also uniformly-distributed floats, enums, etc.

- and also non-uniform distributions

  * random _algorithms_, i.e. algorithms in the sense of 
std.random, but where

their popFront() includes random decision-making

- randomCover, randomSample, etc.

The point of the above breakdown is that it gives a nice and 
clear separation of concerns that allows for easily replaceable 
components.  Separating out stuff just by the ultimate result you 
want (integers vs. floats, uniform vs. non-uniform, etc.) isn't 
helpful in that way.


Re: Mir Random [WIP]

2016-11-22 Thread John Colvin via Digitalmars-d
On Tuesday, 22 November 2016 at 23:55:01 UTC, Andrei Alexandrescu 
wrote:

On 11/22/16 1:31 AM, Ilya Yaroshenko wrote:
 - `opCall` API instead of range interface is used (similar to 
C++)


This seems like a gratuitous departure from common D practice. 
Random number generators are most naturally modeled in D as 
infinite ranges. -- Andrei


I'm pretty sure everyone *wants* it to be a range, but it turns 
out it's a difficult design space. Lot's of nasty trade-offs that 
can be (and have been) argued back and forth depending on your 
priorities.


Perhaps you have an insight that has been missed that can make a 
good rng range without causing less than optimal performance or 
statistically unsafe default behaviour?


IIRC you think people are making too much fuss about the 
statistically safe default behaviour, but it would be interesting 
to hear a more expanded version of that view.


Re: Mir Random [WIP]

2016-11-22 Thread Joseph Rushton Wakeling via Digitalmars-d
On Tuesday, 22 November 2016 at 23:55:01 UTC, Andrei Alexandrescu 
wrote:

On 11/22/16 1:31 AM, Ilya Yaroshenko wrote:
 - `opCall` API instead of range interface is used (similar to 
C++)


This seems like a gratuitous departure from common D practice. 
Random number generators are most naturally modeled in D as 
infinite ranges. -- Andrei


Yes, I think this is avoiding the existing problems with RNGs and 
ranges rather than solving them.


I don't blame anyone for _wanting_ to avoid them; they are nasty, 
subtle issues that seem to keep getting more complex the more one 
looks at them (for example, after my DConf talk last year, I 
realized that there were a whole set of other potential 
complications related to how ranges typically treat laziness).  
But I think they can be solved, and should be.


OTOH, there's no reason per se why there should not be an 
`opCall` for random number generators along the lines of,


UIntType opCall()
{
this.popFront();
return this.front;
}

... just to provide options to the user.  (BTW, note the order 
there, which touches on the issues related to what lazy 
evaluation means not just for RNGs but for any non-deterministic 
IO.)


Re: Mir Random [WIP]

2016-11-22 Thread Andrei Alexandrescu via Digitalmars-d

On 11/22/16 1:31 AM, Ilya Yaroshenko wrote:

 - `opCall` API instead of range interface is used (similar to C++)


This seems like a gratuitous departure from common D practice. Random 
number generators are most naturally modeled in D as infinite ranges. -- 
Andrei