Re: Dconf 2015 talks...

2016-02-08 Thread John Colvin via Digitalmars-d
On Monday, 8 February 2016 at 19:46:19 UTC, Joseph Rushton 
Wakeling wrote:

[snip]


This might be a stupid idea, but perhaps there's something useful 
in it:


Determinism isn't the same thing as "one long chain of numbers 
that everybody reads from".


It can be acceptable to seed a set of reasonable pseudo-random 
number generators with consecutive integers (indeed seeding 
randomly can be dangerous because of the birthday problem). More 
generally, any change of the state of the rng in "seed-space" 
should produce an output equivalent to taking a sample from the 
output distribution.


Can you not have a random number generator make a copy of itself 
like this:


struct RNG
{
State state;
static State.ModifierT modifier;
this(this)
{
this.state.alterBy(modifier++);
//recalculate output sample etc...
}
}

Then any time you copy a RNG, the copy is kicked on to a new path 
in state-space. Basically we're deterministically re-seeding on 
copy.


Re: Dconf 2015 talks...

2016-02-08 Thread Joseph Rushton Wakeling via Digitalmars-d

On Monday, 8 February 2016 at 00:54:24 UTC, Era Scarecrow wrote:
 Either they use more stack space, or they act normally after 
their call is done and are deallocated normally (automatically, 
unless they are passed outside of the scope where they were 
generated).


It's that "passed outside of the scope where they were generated" 
that is bothering me.


Consider:

auto foo (Range) (Range r)
if (isInputRange!Range)
{
return r.randomSample(100).take(10);
}

void main ()
{
iota(1000).foo.writeln;
}

If the RandomSample internals are allocated on the stack using 
the technique you propose, what's going to happen here?



 True; Perhaps have one RNG for seeding and one RNG for 
passing, then reseed after passing the function off, although 
how far deep some of this could go with it's deeper copying; I 
don't know.


No, I really don't think that's an approach that's worth 
pursuing, except as a desperate workaround for the faults of the 
existing design.  RNGs should as much as possible "just work" and 
the user shouldn't have to concern themselves like this with



 Perhaps RNG should be a class outright, which probably removes 
a lot of these problems.


It does indeed give you the desired reference semantics very 
cheaply.  But now suppose you have lots of RandomSamples being 
instantiated in the inner loop of your program.  If _they_ are 
also a class, how do you handle their instantiation and 
deallocation?


Re: Dconf 2015 talks...

2016-02-07 Thread Joseph Rushton Wakeling via Digitalmars-d

On Monday, 25 January 2016 at 22:06:31 UTC, Era Scarecrow wrote:
On Monday, 25 January 2016 at 21:22:13 UTC, Joseph Rushton 
Wakeling wrote:
I have been wondering about how allocators could help to deal 
with these problems.  Could you put forward a minimal example 
of how you would see it working?


 Most likely alloca would have to be built into the compiler. 
Here's a crash course in how the stack memory management works. 
sp=stack pointer, bp=base pointer (more relevant pre 386).


Apologies for the delay in writing back about this.

What you describe makes sense, but I don't quite follow what you 
mean in one particular case:


 Technically alloca simply returns the current sp, then adds to 
it the number of bytes you requested. This means you have to 
run it at the function stack where you want to use it (and not 
in a called function, otherwise corruption). So inlined 
functions where alloca's data would remain would be a must.


I don't quite follow your remark about inlined functions; do you 
mean that the function where the RNG instance is generated must 
be inlined?  (That would make sense in order to avoid the 
internal state being deallocated immediately.)


I think there might be more complications here than just 
allocating individual RNG instances, though (which can happen 
quite early on in the program); what about stuff like random 
algorithms (RandomCover, RandomSample) which might be generated 
deep in internal loops, passed to other functionality as rvalues, 
etc. etc.?



Then it comes down to a simple:

[code]
struct RNG {
  int *state; //add assert to functions that this isn't null
  this(int seed) {
state = alloca(sizeof(int));
*state = seed;
  }
}
[/code]


Yes, missing your understanding of the details of how it would 
have to happen, this is pretty much what I had in mind for random 
ranges; a pointer to internal state nevertheless allocated on the 
stack.  But the already-mentioned concerns about some of the ways 
that stack could be deallocated make for some concerns.


It might be simpler, in practice, to just have the state 
refcounted.


 I suppose the alternate is an option to skip/throw-away some 
numbers that should've been consumed (assuming you want to 
keep using the same seed), or seeding each per use.


I'm not sure I follow what you mean here or why you think this 
would work?  Could you give a concrete example?


certainly.

[code]
struct RNG {
  ...

  void skip(int x) {assert(x>0); while(x--) popfront();}
}

RNG rnd = RND(seed);
rnd.take(10).writeln;  //loosely based on talk examples
rnd.skip(10);  //skips the 'consumed' numbers.
rnd.take(10).writeln;  //won't have identical output

[/code]


I'm afraid that's not really viable :-(  In the best case, it's 
just working around the fundamental design problem via programmer 
virtue.  But the problem is, in the general case, you can't 
anticipate how many random variates may be popped from your 
random number generator inside a function (it might depend on 
other input factors over which you as programmer have no control).


Re: Dconf 2015 talks...

2016-02-07 Thread Era Scarecrow via Digitalmars-d
On Sunday, 7 February 2016 at 22:27:40 UTC, Joseph Rushton 
Wakeling wrote:

On Monday, 25 January 2016 at 22:06:31 UTC, Era Scarecrow wrote:
What you describe makes sense, but I don't quite follow what 
you mean in one particular case:


 Technically alloca simply returns the current sp, then adds 
to it the number of bytes you requested. This means you have 
to run it at the function stack where you want to use it (and 
not in a called function, otherwise corruption). So inlined 
functions where alloca's data would remain would be a must.


 That's the low level assembly language that's generated i'm 
referring to; At least based on what i've read and seen for code 
output from a compiler to a .s file or similar.


I don't quite follow your remark about inlined functions; do 
you mean that the function where the RNG instance is generated 
must be inlined?  (That would make sense in order to avoid the 
internal state being deallocated immediately.)


Assuming alloca moves to the inlined function. Although i had 
another idea thrown in my head where the memory would be 
pre-allocated and you could just point to it when requested via 
an allocator. So assume


@alloca(sizeof(int)) struct RNG {

 During instantiation it would know the size ahead of time and 
just append that to the end of the structure. That extra padding 
space could be handled manually instead.


 this(int seed) {
   state = cast(void*)(this+1);
 }

 But this forced type breaking is quite clunky (and obviously the 
wrong way to write it).


 I recall in C there was suppose to be a way to attach an array 
(of unknown size) immediately following a struct by making the 
length of the array 0, then accessing it directly. But you'd 
still need to guarantee somehow that the access rights are in 
place and not referencing other data which you could screw up 
(via optimizations or something).


I think there might be more complications here than just 
allocating individual RNG instances, though (which can happen 
quite early on in the program); what about stuff like random 
algorithms (RandomCover, RandomSample) which might be generated 
deep in internal loops, passed to other functionality as 
rvalues, etc. etc.?


 Either they use more stack space, or they act normally after 
their call is done and are deallocated normally (automatically, 
unless they are passed outside of the scope where they were 
generated).


It might be simpler, in practice, to just have the state 
refcounted.


 I suppose the alternate is an option to skip/throw-away 
some numbers that should've been consumed
I'm not sure I follow what you mean here or why you think 
this would work?  Could you give a concrete example?



  void skip(int x) {assert(x>0); while(x--) popfront();}



rnd.take(10).writeln;  //loosely based on talk examples
rnd.skip(10);  //skips the 'consumed' numbers.
rnd.take(10).writeln;  //won't have identical output




I'm afraid that's not really viable :-(
But the problem is, in the general case, you can't anticipate 
how many random variates may be popped from your random number 
generator inside a function.


 True; Perhaps have one RNG for seeding and one RNG for passing, 
then reseed after passing the function off, although how far deep 
some of this could go with it's deeper copying; I don't know.


 Perhaps RNG should be a class outright, which probably removes a 
lot of these problems.


Re: Dconf 2015 talks...

2016-01-26 Thread Andrei Alexandrescu via Digitalmars-d

On 01/26/2016 02:07 AM, Joseph Rushton Wakeling wrote:

On Monday, 25 January 2016 at 23:34:56 UTC, Andrei Alexandrescu wrote:

I see no problem with adding a category of rngs that are not forward.
Naming is of course the hardest problem in our community :o). A good
stard would be a /dev/random wrapper. -- Andrei


It's not about different categories of RNG in this case -- really, NO
RNGs should be forward ranges.


I think a pseudo-rng as a forward range is useful. It's good in testing 
and experimentation to fork a sequence of pseudo-random numbers, turn 
the clock back, etc. Essentially I see no harm in it; it's always easy 
to make a forward range into an input range, it's the opposite that's 
hard. -- Andrei





Re: Dconf 2015 talks...

2016-01-26 Thread Joseph Rushton Wakeling via Digitalmars-d
On Tuesday, 26 January 2016 at 12:07:59 UTC, Andrei Alexandrescu 
wrote:
I think a pseudo-rng as a forward range is useful. It's good in 
testing and experimentation to fork a sequence of pseudo-random 
numbers, turn the clock back, etc. Essentially I see no harm in 
it; it's always easy to make a forward range into an input 
range, it's the opposite that's hard. -- Andrei


Yes, but there are other ways to fork that sequence that don't 
involve implementing the .save primitive.  That's why in my talk 
(and here) I distinguish between the .save primitive versus some 
other possible primitive -- say, .dup -- whose promise is, "You 
can use me to duplicate the state of this range, but you must use 
me carefully and in an appropriate context."


This is important, because if you just define pseudo-RNGs as 
forward ranges, most people won't read the small print about when 
you need to wrap it in an input range (which is pretty much any 
time you want to pass it into phobos code and have it behave like 
an actual source of randomness).  As a result, they'll get 
unintended and possibly unnoticed statistical correlations -- 
annoying for something like a computer game, possibly a serious 
consequence if occurring in something like a research simulation.


Re: Dconf 2015 talks...

2016-01-25 Thread Joseph Rushton Wakeling via Digitalmars-d
On Monday, 25 January 2016 at 23:37:25 UTC, Andrei Alexandrescu 
wrote:

On 01/25/2016 05:05 PM, Joseph Rushton Wakeling wrote:
One option would be to implement the basic RNG data structor à 
la C++,

as a functor


That's semantically the same as an input range. -- Andrei


“Yes, but...” :-P

There are actually some interesting subtleties required for the 
input range design, not just for RNGs but for ANY range whose 
elements are generated randomly.


I will try and write this up properly later, but the TL;DR is, it 
involves doing extra work to ensure every element is _truly_ 
lazily evaluated.


To some extent, splitting into functor and range wrapper can help 
clarify the code design there (even if the functor stays hidden 
as an implementation detail).


Re: Dconf 2015 talks...

2016-01-25 Thread Joseph Rushton Wakeling via Digitalmars-d
On Monday, 25 January 2016 at 23:34:56 UTC, Andrei Alexandrescu 
wrote:
I see no problem with adding a category of rngs that are not 
forward. Naming is of course the hardest problem in our 
community :o). A good stard would be a /dev/random wrapper. -- 
Andrei


It's not about different categories of RNG in this case -- 
really, NO RNGs should be forward ranges.


If ONLY it was just a naming problem... :-(


Re: Dconf 2015 talks...

2016-01-25 Thread Dicebot via Digitalmars-d
Main problem is with both making it @safe and avoid unforced 
allocations (i.e. allowing shared state to be kept on stack of 
`main`).


Re: Dconf 2015 talks...

2016-01-25 Thread Joseph Rushton Wakeling via Digitalmars-d

On Monday, 25 January 2016 at 13:05:46 UTC, Era Scarecrow wrote:
 Finally getting around to watching all the talks, all good 
stuff :) Figured I'd make a thread talking about things I came 
across and perhaps get into a discussion on them in detail.


 Anyways I'm watching the talk with Joseph Wakeling involving 
RNG and I wonder, is that still a problem or not?


It's still an issue, at least partly because amid all other 
things in life, I haven't had much opportunity to sit down and 
focus on it :-(


 I ask since the simplest and most correct thing I can think of 
is to use the heap for a referenced state, and dup would then 
duplicate the state properly while otherwise the RNG problems 
go away that most of the talk went on about.


 It's possible I've been out of the loop and this has already 
been solved. Not sure yet, I haven't looked closely at any of 
the sources or the language updates.


 So... time for some rusty D prototyping.

[code]
struct RNG {
  int* state;

  this(int seed) {
state = new int;
*state = seed;
  }

  //if you provide a pointer, we don't rely on the heap/gc at 
all...
  //maybe a small stack based allocator could be useful for 
this?

  this(int* seed) { state = seed; }

  //dup instead of save, so it's not a forward range
  RNG dup() { return RNG(*state); }

  //popfront, front & empty as expected
}
[/code]


The major problems are not so much with RNGs per se (note that 
your approach can actually be much more nicely implemented as a 
final class).  It's all the functionality that _wraps_ RNGs, such 
as random algorithms like randomCover and randomSample, or a D 
equivalent of C++'s random distributions.  These need to also be 
reference types (for their internal state as well as the RNG they 
wrap), and here it gets very tricky to avoid nasty situations 
with lots of allocations and frees (when ideally, you'd like 
stuff like this to be stack only).


Don't even ask about how complicated a `.dup` method gets when 
you start trying to extend to such entities. :-(


Re: Dconf 2015 talks...

2016-01-25 Thread Era Scarecrow via Digitalmars-d
On Monday, 25 January 2016 at 17:19:05 UTC, Joseph Rushton 
Wakeling wrote:
Implementing the random algorithms/other wrappers as a class is 
problematic because then you get into the hassle of potentially 
having to new/free a lot of individual heap objects deep in the 
inner loop of your program.  I already tried this in 
hap.random, and came to the conclusion that it wasn't a valid 
approach.


 What about an alternative allocator? Specifically I'm thinking 
in C's equivalent which is alloca (allocates directly on the 
stack with the rest of the variables). If the constructor is a 
function then that won't work; but if it's inlined then it should 
work.


 I suppose the alternate is an option to skip/throw away some 
numbers that should've been consumed (assuming you want to keep 
using the same seed), or seeding each per use.


Re: Dconf 2015 talks...

2016-01-25 Thread Era Scarecrow via Digitalmars-d
On Monday, 25 January 2016 at 14:31:12 UTC, Joseph Rushton 
Wakeling wrote:
It's all the functionality that _wraps_ RNGs, such as random 
algorithms like randomCover and randomSample, or a D equivalent 
of C++'s random distributions.  These need to also be reference 
types (for their internal state as well as the RNG they wrap), 
and here it gets very tricky to avoid nasty situations with 
lots of allocations and frees (when ideally, you'd like stuff 
like this to be stack only).


 Hmm i wonder... If recognizes it as infinite, could it avoid 
treating them as forward ranges? As a struct it still wouldn't 
work, but as a class/reference type it would work then...


Re: Dconf 2015 talks...

2016-01-25 Thread Joseph Rushton Wakeling via Digitalmars-d

On Monday, 25 January 2016 at 15:38:45 UTC, Era Scarecrow wrote:
 Hmm i wonder... If recognizes it as infinite, could it avoid 
treating them as forward ranges? As a struct it still wouldn't 
work, but as a class/reference type it would work then...


They shouldn't be forward ranges anyway, because if their content 
is randomly generated then it's not legit for them to implement 
the .save property.  The whole implementation of stuff in 
std.random via forward ranges is a massive design mistake.


Implementing the random algorithms/other wrappers as a class is 
problematic because then you get into the hassle of potentially 
having to new/free a lot of individual heap objects deep in the 
inner loop of your program.  I already tried this in hap.random, 
and came to the conclusion that it wasn't a valid approach.


Re: Dconf 2015 talks...

2016-01-25 Thread Jonathan M Davis via Digitalmars-d
On Monday, 25 January 2016 at 17:19:05 UTC, Joseph Rushton 
Wakeling wrote:

On Monday, 25 January 2016 at 15:38:45 UTC, Era Scarecrow wrote:
 Hmm i wonder... If recognizes it as infinite, could it avoid 
treating them as forward ranges? As a struct it still wouldn't 
work, but as a class/reference type it would work then...


They shouldn't be forward ranges anyway, because if their 
content is randomly generated then it's not legit for them to 
implement the .save property.  The whole implementation of 
stuff in std.random via forward ranges is a massive design 
mistake.


As long as the numbers are pseudo-random, then in theory, there's 
no problem with a range of random numbers being a forward range. 
That could be useful upon occasion (similar to how it can be 
useful that the same seed results in the same sequence of 
numbers). The problem is that if they're a forward range, then 
you tend to get code that accidentally ends up reusing numbers in 
the sequence and that definitely is a problem. So ultimately, 
they probably should be input ranges rather than forward ranges, 
but I think that it's more of an issue with human error than with 
what makes sense from a technical perspective.


Regardless, non-pseudo random number generators obviously can't 
be forward ranges though, since their state isn't savable or 
repeatable.


- Jonathan M Davis



Re: Dconf 2015 talks...

2016-01-25 Thread Joseph Rushton Wakeling via Digitalmars-d
BTW, I apologize for the rather terse replies so far; busy day 
:-)  I'll try and write out a more complete summary of the 
problem at some point in the near future (though it might have to 
wait 'til the weekend).


Thanks for the interest in contributing to solving this problem 
:-)


Re: Dconf 2015 talks...

2016-01-25 Thread Joseph Rushton Wakeling via Digitalmars-d

On Monday, 25 January 2016 at 20:26:12 UTC, Era Scarecrow wrote:
 What about an alternative allocator? Specifically I'm thinking 
in C's equivalent which is alloca (allocates directly on the 
stack with the rest of the variables). If the constructor is a 
function then that won't work; but if it's inlined then it 
should work.


I have been wondering about how allocators could help to deal 
with these problems.  Could you put forward a minimal example of 
how you would see it working?


 I suppose the alternate is an option to skip/throw away some 
numbers that should've been consumed (assuming you want to keep 
using the same seed), or seeding each per use.


I'm not sure I follow what you mean here or why you think this 
would work?  Could you give a concrete example?


Re: Dconf 2015 talks...

2016-01-25 Thread Era Scarecrow via Digitalmars-d
On Monday, 25 January 2016 at 21:20:09 UTC, Joseph Rushton 
Wakeling wrote:
I thought that too, for a long time, but I came to the 
conclusion it's not the case.




That's fine if you're dealing with something whose behaviour is 
meant to be deterministic, but if you call this with a 
pseudo-random sequence, than every time you call it, you will 
get the same result.  It won't matter if what you pass is a  
reference-type -- the .save (which is correct behaviour for 
handling a forward range) means you're just repeatedly using a 
copy of the random sequence.




Right -- non-PRNGs must be input ranges by design.  I came to 
the conclusion that pseudo-RNGs need to be input ranges, but 
that implement an alternative to .save: a .dup method whose 
promise is, "You can duplicate the state and hence behaviour of 
this range, but you can't make any assumptions that this is a 
safe or sane thing to do."



 So in short, the RNG shouldn't be a range at all. Of course it 
could be a struct (for sanity and other reasons), but not a range.


 I wonder then, assuming we remove RNG from being a range, the a 
RNG could give out a delegate of it's current data, and that 
delegate get passed to a range-like wrapper? Or maybe the RNG 
returns a Voldemort range that referrs to the original RNG which 
isn't a range anymore... Actually that sounds promising. Aliasing 
could deal with it automatically converting/creating the range.


Re: Dconf 2015 talks...

2016-01-25 Thread Joseph Rushton Wakeling via Digitalmars-d
On Monday, 25 January 2016 at 18:40:59 UTC, Jonathan M Davis 
wrote:
As long as the numbers are pseudo-random, then in theory, 
there's no problem with a range of random numbers being a 
forward range.


I thought that too, for a long time, but I came to the conclusion 
it's not the case.


Here's the essence of the problem: a forward range is a promise 
to the code that uses it, "I am a deterministic sequence."  But 
the whole point of pseudo-random sequences is that you're not 
supposed to be able to tell them from actual randomness.


If you _tell_ downstream code "I am deterministic", that code may 
make assumptions about how it can use you, that are valid for 
"normal" deterministic sequences, but aren't valid for what are 
supposedly random sequences.


Consider the typical handling of forward ranges:

auto foo (R) (R range)
if (isForwardRange!R)
{
auto rcopy = range.save;

// do something with rcopy
}

That's fine if you're dealing with something whose behaviour is 
meant to be deterministic, but if you call this with a 
pseudo-random sequence, than every time you call it, you will get 
the same result.  It won't matter if what you pass is a 
reference-type -- the .save (which is correct behaviour for 
handling a forward range) means you're just repeatedly using a 
copy of the random sequence.


That could be useful upon occasion (similar to how it can be 
useful that the same seed results in the same sequence of 
numbers).


Right, but the point is that re-use of a pseudo-random sequence 
should happen at programmer request only, not under the hood in 
library code they call -- which is what unfortunately happens if 
you implement RNGs and other random sequences as forward ranges 
:-(



The problem is that if they're a forward range, then you tend 
to get code that accidentally ends up reusing numbers in the 
sequence and that definitely is a problem. So ultimately, they 
probably should be input ranges rather than forward ranges, but 
I think that it's more of an issue with human error than with 
what makes sense from a technical perspective.


Again, I thought that too, but I came to the conclusion that I'd 
been wrong.  It's not about fallible humans, it's about the 
promise a forward range makes.


Superficially, it looks like that promise is: "You can use my 
deterministic nature if you need to."  But actually, I think it 
is really something stronger: "You can use my deterministic 
nature freely and nothing will go wrong in doing so."


That's a much stronger promise than can be allowed for a 
pseudo-RNG, and I think it's borne out in the way in which e.g. 
phobos functionality makes use of forward ranges.


Regardless, non-pseudo random number generators obviously can't 
be forward ranges though, since their state isn't savable or 
repeatable.


Right -- non-PRNGs must be input ranges by design.  I came to the 
conclusion that pseudo-RNGs need to be input ranges, but that 
implement an alternative to .save: a .dup method whose promise 
is, "You can duplicate the state and hence behaviour of this 
range, but you can't make any assumptions that this is a safe or 
sane thing to do."


Re: Dconf 2015 talks...

2016-01-25 Thread Era Scarecrow via Digitalmars-d
On Monday, 25 January 2016 at 21:22:13 UTC, Joseph Rushton 
Wakeling wrote:

On Monday, 25 January 2016 at 20:26:12 UTC, Era Scarecrow wrote:
 What about an alternative allocator? Specifically I'm 
thinking in C's equivalent which is alloca (allocates directly 
on the stack with the rest of the variables). If the 
constructor is a function then that won't work; but if it's 
inlined then it should work.


I have been wondering about how allocators could help to deal 
with these problems.  Could you put forward a minimal example 
of how you would see it working?


 Most likely alloca would have to be built into the compiler. 
Here's a crash course in how the stack memory management works. 
sp=stack pointer, bp=base pointer (more relevant pre 386).


 When you prepare to enter a function (or block) the bp register 
is backed up, the sp is saved in the bp register, then you add 
your arguments, then the function is entered. After the function 
is entered it adds to the sp register which give it a block of 
space for all known local variables at once; then does any 
assignments it needs to. When you leave the function/block you 
simply replace the sp register with your backup the bp register. 
Once you return you restore the old bp register.


 So something close to this:
[code]
  ...
  push bp
  push calling_argument(s)
  call func
  sub sp, -N //size of total arguments pushed
  pop bp //completely done with function call.
  ...


  func: mov bp, sp
  add sp, +N //size of local variables
  // setup variables
  // code

  // cleanup/return
  mov sp, bp
  ret
[/code]

 Technically alloca simply returns the current sp, then adds to 
it the number of bytes you requested. This means you have to run 
it at the function stack where you want to use it (and not in a 
called function, otherwise corruption). So inlined functions 
where alloca's data would remain would be a must. Then it comes 
down to a simple:


[code]
struct RNG {
  int *state; //add assert to functions that this isn't null
  this(int seed) {
state = alloca(sizeof(int));
*state = seed;
  }
}
[/code]

 I suppose the alternate is an option to skip/throw-away some 
numbers that should've been consumed (assuming you want to 
keep using the same seed), or seeding each per use.


I'm not sure I follow what you mean here or why you think this 
would work?  Could you give a concrete example?


certainly.

[code]
struct RNG {
  ...

  void skip(int x) {assert(x>0); while(x--) popfront();}
}

RNG rnd = RND(seed);
rnd.take(10).writeln;  //loosely based on talk examples
rnd.skip(10);  //skips the 'consumed' numbers.
rnd.take(10).writeln;  //won't have identical output

[/code]



Re: Dconf 2015 talks...

2016-01-25 Thread Andrei Alexandrescu via Digitalmars-d

On 01/25/2016 05:05 PM, Joseph Rushton Wakeling wrote:

One option would be to implement the basic RNG data structor à la C++,
as a functor


That's semantically the same as an input range. -- Andrei



Re: Dconf 2015 talks...

2016-01-25 Thread Joseph Rushton Wakeling via Digitalmars-d

On Monday, 25 January 2016 at 21:30:47 UTC, Era Scarecrow wrote:
 So in short, the RNG shouldn't be a range at all. Of course it 
could be a struct (for sanity and other reasons), but not a 
range.


 I wonder then, assuming we remove RNG from being a range, the 
a RNG could give out a delegate of it's current data, and that 
delegate get passed to a range-like wrapper? Or maybe the RNG 
returns a Voldemort range that referrs to the original RNG 
which isn't a range anymore... Actually that sounds promising. 
Aliasing could deal with it automatically converting/creating 
the range.


Well, an idea that was suggested to me at DConf (by several 
people; thanks in particular to Steve S. and to DiceBot!) was 
indeed to separate RNG state from the range interface, and to 
disable copy-by-value for the state data structure.


One option would be to implement the basic RNG data structor à la 
C++, as a functor: so it'd be something like:


struct SomeRNG(UIntType, ...)
{
  private:
// the RNG state variables

  public:
UIntType opCall()
{
// returns a random variate
}
}

... and again, you'd disable copy-by-value for this entity.  I 
had some fun a while ago playing with this and the appropriate 
technique to wrap it into a range (it's not as trivial as you 
think, because you need to use some tricks to ensure truly lazy 
evaluation of the range, where D ranges typically evaluate the 
first element eagerly).


Where I ran into trouble was considering how to extend this 
approach in a viable way to stuff like randomSample, i.e. the 
stuff which wraps randomness, which also needs to ensure its 
internal state is never copied by value, and yet which needs 
(ideally) to fit nicely into a UFCS chain of ranges:


someInput.filter!(WHATEVER).randomSample(n, 
rng).map!(...).writeln;


I suppose it might be possible to implement a functor-based 
approach for this, that could be wrapped in a range, but it feels 
nasty for something which really ought to fit more naturally into 
the range syntax: a random sample is just an algorithm (similar 
to those in std.algorithm), but one whose elements need to be 
truly lazily evaluated and whose values are not deterministic but 
depend on a source of randomness.


The entire complication comes out of the fact that in order to 
play nice in a statistical sense, you need the RandomSample range 
to be a reference type, but in order to play nice with the kind 
of in-the-middle-of-the-inner-loop use above, you need it to be 
cheap and quick to instantiate and destroy (so classes and heap 
allocation are a problem).


Basically, what you probably want is for RandomSample to be a 
struct, but with a reference-type internal state that is 
nevertheless allocated on the stack and that is cheap to create 
and let go of when you're done with it.


Re: Dconf 2015 talks...

2016-01-25 Thread Joseph Rushton Wakeling via Digitalmars-d

On Monday, 25 January 2016 at 22:06:31 UTC, Era Scarecrow wrote:
 Most likely alloca would have to be built into the compiler. 
Here's a crash course in how the stack memory management works. 
sp=stack pointer, bp=base pointer (more relevant pre 386).


An apology in advance: I have an early start tomorrow, and a busy 
few days coming up, so I probably won't be able to follow up on 
this discussion until the weekend.


In the meantime, thanks for the food for thought.  I'll try to be 
in touch about this topic soon :-)


Re: Dconf 2015 talks...

2016-01-25 Thread Jonathan M Davis via Digitalmars-d

On Monday, 25 January 2016 at 21:30:47 UTC, Era Scarecrow wrote:
On Monday, 25 January 2016 at 21:20:09 UTC, Joseph Rushton 
Wakeling wrote:
Right -- non-PRNGs must be input ranges by design.  I came to 
the conclusion that pseudo-RNGs need to be input ranges, but 
that implement an alternative to .save: a .dup method whose 
promise is, "You can duplicate the state and hence behaviour 
of this range, but you can't make any assumptions that this is 
a safe or sane thing to do."



 So in short, the RNG shouldn't be a range at all. Of course it 
could be a struct (for sanity and other reasons), but not a 
range.


Why? They work fine as input ranges regardless of whether having 
them be forward ranges make sense. By its very nature, an input 
range cannot be saved, and it's not assumed that it's 
deterministic. The issue that we run into isn't that they're 
ranges but that they're not implemented as reference types, and 
copies of them accidentally get used, resulting in subtle bugs. 
But I'd strongly argue that we should at least consider requiring 
that all input ranges be reference types, since they don't make 
sense as value types, and pseudo-reference types are just plain 
buggy. That's a wider discussion than random number generator 
ranges though.


- Jonathan M Davis


Re: Dconf 2015 talks...

2016-01-25 Thread Andrei Alexandrescu via Digitalmars-d

On 01/25/2016 04:20 PM, Joseph Rushton Wakeling wrote:

I thought that too, for a long time, but I came to the conclusion it's
not the case.


I see no problem with adding a category of rngs that are not forward. 
Naming is of course the hardest problem in our community :o). A good 
stard would be a /dev/random wrapper. -- Andrei