Re: Fixing implicit copies of InputRanges [was: Re: Mir Random [WIP]]
On Saturday, November 26, 2016 12:24:47 Andrei Alexandrescu via Digitalmars- d wrote: > On 11/26/2016 01:55 AM, Martin Nowak wrote: > > On Saturday, 26 November 2016 at 06:46:19 UTC, Martin Nowak wrote: > >> Maybe non-copyability needs to become a requirement for InputRanges. > > > > Could have an optional .clone if copying is supported. > > What would be an InputRange where copying is correct? > > Input ranges with disabled this(this) would be a major breaking change > that we probably cannot bear (and shouldn't because the gain is small). > I think an input range would be at best "pure reference" in the sense > that popFront advances all copies to the same position. -- Andrei For a while now, I've been tempted to argue that we should require that well-behaved input ranges be reference types (or at least have reference semantics). It just seems like too much misbehaves if they're not, and arguably, the whole reason that they're input ranges and not forward ranges (aside from the programmer just not bothering to implement save) is that they effectively have reference semantics whether they're actually coded that way or not. We obviously can't test for that behavior at compile time, but it could easily be on the list of things that are documented as a requirement for well-behaved ranges. I'm also tempted to argue that well-behaved forward ranges and greater should have value semantics (in the sense that dynamic arrays do), but that's a much bigger change (it effectively makes save unnecessary), and proper use of save works around that problem (though it's very common that save isn't used as often as it should be). It would clean up a lot of the mistakes that occur with forward ranges though. However, regardless, a non-copyable range would not play nicely at all with how range-based code works right now. You'd have to use ref all over the place, which range-based code simply doesn't do and would not play well with function chaining, since you'd need lvalues, and since you couldn't copy the range to the stack to make it an lvalue, that seems like it would go nowhere fast. - Jonathan M Davis
Re: Fixing implicit copies of InputRanges [was: Re: Mir Random [WIP]]
On 11/26/2016 01:55 AM, Martin Nowak wrote: On Saturday, 26 November 2016 at 06:46:19 UTC, Martin Nowak wrote: Maybe non-copyability needs to become a requirement for InputRanges. Could have an optional .clone if copying is supported. What would be an InputRange where copying is correct? Input ranges with disabled this(this) would be a major breaking change that we probably cannot bear (and shouldn't because the gain is small). I think an input range would be at best "pure reference" in the sense that popFront advances all copies to the same position. -- Andrei
Re: Fixing implicit copies of InputRanges [was: Re: Mir Random [WIP]]
On Saturday, 26 November 2016 at 06:55:24 UTC, Martin Nowak wrote: Should we split off this discussion to a dlang-study thread? I would personally really welcome that, but subject to the understanding that people agree to look seriously at random algorithms (like RandomSample) and not focus the discussion solely on RNGs.
Re: Mir Random [WIP]
On Saturday, 26 November 2016 at 06:46:19 UTC, Martin Nowak wrote: Yes the problems are inadvertent copies and a disabled this(this) would prevent that. RNGs should have unique ownership of their internal state. Using InputRanges with phobos is somewhat clumsy. Maybe people have been burned by those and now generally consider InputRanges as broken. Maybe non-copyability needs to become a requirement for InputRanges. I remember you made that suggestion of disabled this(this) to me a while back, and I did use it for this small collection of RNGs: https://github.com/WebDrake/dxorshift We should add a reference RNG that transforms any other RNG into an input range that shares the actual RNG. We already have refRange for that, no? Also refCounted(Random(unpredictableSeed)) should work. RefRange still has the issue that it only forwards the range primitives and not other properties such as the `isUniformRandom` enum. But those are probably fixable. However, these approaches basically cover the case of something where an instance is initialized high up in the program and passed around by ref throughout its lifetime. That doesn't address the question of how random _algorithms_ like `RandomSample` could be updated for safety (since they might often need to be initialized multiple times in the inner loops of a program). See: https://forum.dlang.org/post/gnlvyogolkmocujtn...@forum.dlang.org https://forum.dlang.org/post/gpsilefdillwtleuw...@forum.dlang.org for some discussion of the problem.
Fixing implicit copies of InputRanges [was: Re: Mir Random [WIP]]
On Saturday, 26 November 2016 at 06:46:19 UTC, Martin Nowak wrote: Maybe non-copyability needs to become a requirement for InputRanges. Could have an optional .clone if copying is supported. What would be an InputRange where copying is correct? We already have refRange for that, no? Also refCounted(Random(unpredictableSeed)) should work. Same scheme works for files with only plain int fds streams, non-copyable/unique ownership, and move, refRange, or refCounted for passing and sharing. Should we split off this discussion to a dlang-study thread?
Re: Mir Random [WIP]
On Wednesday, 23 November 2016 at 01:34:23 UTC, Andrei Alexandrescu wrote: On 11/22/16 7:30 PM, John Colvin wrote: On Tuesday, 22 November 2016 at 23:55:01 UTC, Andrei Alexandrescu wrote: The proposed design has disabled copy ctor and uses opCall() for a new element. That seems to be a difference without a distinction from an input range that has disabled copy ctor and offers the input range primitives. Yes the problems are inadvertent copies and a disabled this(this) would prevent that. RNGs should have unique ownership of their internal state. Using InputRanges with phobos is somewhat clumsy. Maybe people have been burned by those and now generally consider InputRanges as broken. Maybe non-copyability needs to become a requirement for InputRanges. We should add a reference RNG that transforms any other RNG into an input range that shares the actual RNG. We already have refRange for that, no? Also refCounted(Random(unpredictableSeed)) should work.
Re: Mir Random [WIP]
On Thursday, 24 November 2016 at 14:22:05 UTC, Jonathan M Davis wrote: Then call popFront or drop before passing it along if you're paranoid about it. There's little need for it, as it was pointed out earlier that the generate algorithm does the right thing and needs only opCall, also a nice example of orthogonality and composability.
Re: Mir Random [WIP]
On Friday, November 25, 2016 08:32:07 Joseph Rushton Wakeling via Digitalmars-d wrote: > On Wednesday, 23 November 2016 at 21:19:51 UTC, Jonathan M Davis > > wrote: > > It's quite feasible if you don't calculate front until it's > > called or popFront is called, and then you cache the result so > > that subsequent calls to front prior to the call to popFront > > return the same result, but it creates additional overhead, > > because then every call to front and popFront has to check > > whether the value has been calculated yet. And if every range > > in the chain of ranges has to do that, then that additional > > overhead is going to add up. > > Yes, indeed. I actually came up with a general purpose solution > for that a while back via template mixin (although the version in > the post linked to is not the final version: I updated it to use > HaraldZealot's suggestion of making the frontImplementation and > popFrontImplementation the template parameters): > https://forum.dlang.org/post/mailman.263.1439926667.13986.digitalmars-d@pu > remagic.com > > In general, it's just cleaner to either calculate front on > > every call to front (which doesn't make sense in the case of > > random number generators) or to calculate the first front upon > > construction and be done with it. > > In some ways I think it's cleaner to not privilege the first > front value and go for "true" laziness for all front values -- > particularly when dealing with stuff like RandomSample. It's much messier from an implementation standpoint. If the constructor checks for empty and caches front if the range it was given was non-empty, then every call to front can just return front, and every call to popFront can just pop the element off and do whatever it does to turn the front from the wrapped range into the front of this range and then cache it. If the constructor doesn't do that work, then every call to front and every call to popFront have to check whether they're the first time that either has been called and _then_ they end up doing all of the same work anyway. So, there's more code, and it's less efficient. And if every range in the chain does that, then you're getting a lot of extra checks just to determine whether it's the first time that front or popFront is being called. Now, if there's never any caching of the front value, and front always calculates it, then that might actually be more efficient if front is only ever called once per element, because you don't get the cost of copying the element to cache it, but if front gets called multiple times (which happens fairly often, I think), then the cost is definitely greater. You're never going to be able to rely on a fully lazy front anyway unless we specified it as part of the range API that all ranges work that way, and as it stands, very few ranges do. > The downside is that `.front` becomes non-const. Well, realistically, as it stands, ranges are utterly useless with const anyway. We'd have to have a standard mechanism for getting a tail-const copy of a range from a const or immutable range, and we don't. > Sometimes I > wonder whether it wouldn't have been better to require that > `popFront()` be called before even the first call to `.front`, > and place the onus on `popFront()` to handle any special-casing > of the first `.front` value. That would be terrible with chaining ranges. It also would be _very_ error-prone. It's also heading into two-phase initialization territory, which is almost always a bad idea. > > And I think that the general consensus has been that > > calculating front in the constructor and popFront and caching > > works better than calculating it on every call to front, but > > that doesn't always work (e.g. map), and that caching does > > cause issues occasionally. > > The case I always think of is: what if your input range is > designed to correspond to sensor observations made at a > particular time? This is a case where cacheing the first value > on construction is very problematic. > > For RNGs it's really not such a big deal so long as successive > variates are independent, but for random algorithms I think the > laziness matters, since their popFront is essentially > IO-dependent (in the Haskell-style sense that randomness is IO). Honestly, I don't think that it really works to have a range where the value of its elements depend on when you call front or popFront. The API lets you do it, but you're going to run into plenty of problems if you do unless you don't care about the frequency at which the values are generated. Part of the problem here is that ranges and the algorithms that use them are really designed with arrays in mind. A non-random-access range reduces those capabilities, and a basic input range doesn't actually require that the elements be reproducible aside from multiple calls to front giving the same result, but it's still the case that it's pretty much assumed that a range is a fixed set of values, and pretty much
Re: Mir Random [WIP]
On Wednesday, 23 November 2016 at 21:19:51 UTC, Jonathan M Davis wrote: It's quite feasible if you don't calculate front until it's called or popFront is called, and then you cache the result so that subsequent calls to front prior to the call to popFront return the same result, but it creates additional overhead, because then every call to front and popFront has to check whether the value has been calculated yet. And if every range in the chain of ranges has to do that, then that additional overhead is going to add up. Yes, indeed. I actually came up with a general purpose solution for that a while back via template mixin (although the version in the post linked to is not the final version: I updated it to use HaraldZealot's suggestion of making the frontImplementation and popFrontImplementation the template parameters): https://forum.dlang.org/post/mailman.263.1439926667.13986.digitalmar...@puremagic.com In general, it's just cleaner to either calculate front on every call to front (which doesn't make sense in the case of random number generators) or to calculate the first front upon construction and be done with it. In some ways I think it's cleaner to not privilege the first front value and go for "true" laziness for all front values -- particularly when dealing with stuff like RandomSample. The downside is that `.front` becomes non-const. Sometimes I wonder whether it wouldn't have been better to require that `popFront()` be called before even the first call to `.front`, and place the onus on `popFront()` to handle any special-casing of the first `.front` value. And I think that the general consensus has been that calculating front in the constructor and popFront and caching works better than calculating it on every call to front, but that doesn't always work (e.g. map), and that caching does cause issues occasionally. The case I always think of is: what if your input range is designed to correspond to sensor observations made at a particular time? This is a case where cacheing the first value on construction is very problematic. For RNGs it's really not such a big deal so long as successive variates are independent, but for random algorithms I think the laziness matters, since their popFront is essentially IO-dependent (in the Haskell-style sense that randomness is IO).
Re: Mir Random [WIP]
On Thursday, 24 November 2016 at 08:36:41 UTC, Andrea Fontana wrote: On Wednesday, 23 November 2016 at 17:31:58 UTC, Kagamin wrote: On Wednesday, 23 November 2016 at 01:28:11 UTC, Andrei Alexandrescu wrote: Interesting. Could you please add a couple of links about that? -- Andrei http://xoroshiro.di.unimi.it/ Very short public domain implementation: http://xoroshiro.di.unimi.it/xoroshiro128plus.c Implementation in D, written during DConf 2016 ;-) https://github.com/WebDrake/dxorshift
Re: Mir Random [WIP]
On 24.11.2016 14:50, Kagamin wrote: On Thursday, 24 November 2016 at 12:02:22 UTC, John Colvin wrote: Because it's correct. If a.length is larger than int.max then cast(int)a.length will half the time be negative and therefore a simple rightshift would not be equivalent to division by 2. It can't be possibly correct when a.length is larger than int.max because it doesn't recover information lost in a narrowing conversion. The code does not specify that this information should be recovered. Why is this the compiler's problem?
Re: Mir Random [WIP]
On Thursday, 24 November 2016 at 17:04:43 UTC, John Colvin wrote: If druntime was initialised by default using __attribute__((constructor)) for static and linker .init for shared libraries, would that be good enough for you*? I feel like you're limiting your design choices because of a relatively small and simple implementation shortcoming. * remember, we don't currently guarantee ABI compatibility between compiler versions even if you don't use druntime/phobos. No, it should work without DRuntime. Nicholas wrote that it will work on GPU using dcompute :-) BetterC is the goal for the future Mir-Phobos, hehe
Re: Mir Random [WIP]
On Thursday, 24 November 2016 at 16:09:23 UTC, Ilya Yaroshenko wrote: On Thursday, 24 November 2016 at 13:45:40 UTC, Jonathan M Davis wrote: Alternatively, you could just do rndGen().take(1).front, and as long as rndGen() gives you a reference type, it works just fine. Unfortunately, std.random did not use reference types for its ranges. _That_ is the big mistake of std.random and the main item that needs to be fixed. There are some other subtleties (e.g. it's useful to be able to save the state of a random number generating range, but you don't necessarily really want it to be a forward range), but those are minor in comparison to the mistake of std.random using value types rather than reference types for ranges. - Jonathan M Davis The fix is opCall syntax. Reference types are classes, which would not work without linking DRuntime and Phobos (BetterC). If druntime was initialised by default using __attribute__((constructor)) for static and linker .init for shared libraries, would that be good enough for you*? I feel like you're limiting your design choices because of a relatively small and simple implementation shortcoming. * remember, we don't currently guarantee ABI compatibility between compiler versions even if you don't use druntime/phobos.
Re: Mir Random [WIP]
On Thursday, 24 November 2016 at 13:45:40 UTC, Jonathan M Davis wrote: Alternatively, you could just do rndGen().take(1).front, and as long as rndGen() gives you a reference type, it works just fine. Unfortunately, std.random did not use reference types for its ranges. _That_ is the big mistake of std.random and the main item that needs to be fixed. There are some other subtleties (e.g. it's useful to be able to save the state of a random number generating range, but you don't necessarily really want it to be a forward range), but those are minor in comparison to the mistake of std.random using value types rather than reference types for ranges. - Jonathan M Davis The fix is opCall syntax. Reference types are classes, which would not work without linking DRuntime and Phobos (BetterC). Refcounting is to slow to implement for users. Note, that Engines is only backend RNGs. There are also nonuniform distributions (WIP). See the example of users code: https://forum.dlang.org/post/nyvtoejvmsaolzaqy...@forum.dlang.org . It is very difficult to implement both user random variable and Engine using Range API. Note, that code should work without DRuntime and should be simple (the example is user code). Ilya
Re: Mir Random [WIP]
On Thursday, November 24, 2016 13:54:50 Kagamin via Digitalmars-d wrote: > On Thursday, 24 November 2016 at 13:45:40 UTC, Jonathan M Davis > > wrote: > > How so? Because someone might call range.front again without > > bothering to call popFront? > > That's what everything in std.algorithm does. Then call popFront or drop before passing it along if you're paranoid about it. If it's a serious concern that this be fixed in general, then the obvious fix to that would be to make it so that rndGen() just called popFront before returning it. Heck, if rndGen() were guaranteed to call popFront when it was called rather than simply returning the range, then you could just do rndGen().front whenever you wanted the equivalent of rand(), making it trivial to use rndGen() both for cases where you wanted a single number and for cases where you wanted a range of them - and without worrying about front being stale. - Jonathan M Davis
Re: Mir Random [WIP]
On Thursday, 24 November 2016 at 13:45:40 UTC, Jonathan M Davis wrote: How so? Because someone might call range.front again without bothering to call popFront? That's what everything in std.algorithm does.
Re: Mir Random [WIP]
On Thursday, 24 November 2016 at 12:02:22 UTC, John Colvin wrote: Because it's correct. If a.length is larger than int.max then cast(int)a.length will half the time be negative and therefore a simple rightshift would not be equivalent to division by 2. It can't be possibly correct when a.length is larger than int.max because it doesn't recover information lost in a narrowing conversion.
Re: Mir Random [WIP]
On Thursday, November 24, 2016 09:05:34 Kagamin via Digitalmars-d wrote: > On Wednesday, 23 November 2016 at 21:33:53 UTC, Jonathan M Davis > > wrote: > > though I think that using the comma operator like that is > > deprecated now. Adding a helper function such as > > > > auto getNext(R)(ref R range) > > if(isInputRange!R) > > { > > range.popFront(); > > return range.front; > > } > > > > would solve that problem. > > It's the same behavior and suffers from the same problem of reuse > of RNG output. How so? Because someone might call range.front again without bothering to call popFront? If you're paranoid about that, then it can call popFront again before returning (though that would be wasteful). My take on it is that if you just call popFront before using the random number generator range, then you don't have to worry about what any other code that used it did. Regardless, the range API is _way_ more useful in general than something like rand(). And if anyone is trying to avoid ranges with random numbers, I think that they're just making their life harder. Occasionally, it's useful to get just one random number, in which case, having to deal with the range API over rand() is kind of annoying, but in general, it's not a problem, and the wrapper function that I suggested basically gives you rand() from a range of random numbers. Alternatively, you could just do rndGen().take(1).front, and as long as rndGen() gives you a reference type, it works just fine. Unfortunately, std.random did not use reference types for its ranges. _That_ is the big mistake of std.random and the main item that needs to be fixed. There are some other subtleties (e.g. it's useful to be able to save the state of a random number generating range, but you don't necessarily really want it to be a forward range), but those are minor in comparison to the mistake of std.random using value types rather than reference types for ranges. - Jonathan M Davis
Re: Mir Random [WIP]
On Thursday, 24 November 2016 at 10:41:44 UTC, Kagamin wrote: On Thursday, 24 November 2016 at 10:16:11 UTC, John Colvin wrote: I was talking about andrei's example. He has a cast(int) in it before the division. And you think that compilation of cast(int)a.length/2 to 3 instructions is optimized just fine? How is it better that one shift? Because it's correct. If a.length is larger than int.max then cast(int)a.length will half the time be negative and therefore a simple rightshift would not be equivalent to division by 2.
Re: Mir Random [WIP]
On Thursday, 24 November 2016 at 10:16:11 UTC, John Colvin wrote: I was talking about andrei's example. He has a cast(int) in it before the division. And you think that compilation of cast(int)a.length/2 to 3 instructions is optimized just fine? How is it better that one shift?
Re: Mir Random [WIP]
On Thursday, 24 November 2016 at 10:14:27 UTC, Kagamin wrote: On Thursday, 24 November 2016 at 09:46:16 UTC, John Colvin wrote: That's because of the cast(int), dividing by two is optimised just fine. What about Andrei's example? I was talking about andrei's example. He has a cast(int) in it before the division.
Re: Mir Random [WIP]
On Thursday, 24 November 2016 at 09:46:16 UTC, John Colvin wrote: That's because of the cast(int), dividing by two is optimised just fine. What about Andrei's example?
Re: Mir Random [WIP]
On Thursday, 24 November 2016 at 08:55:18 UTC, Kagamin wrote: On Wednesday, 23 November 2016 at 21:18:27 UTC, Andrei Alexandrescu wrote: [Details needed] I just took a look https://godbolt.org/g/EMy6X4, it's happening. That's three instructions instead of one shr eax,1 That's because of the cast(int), dividing by two is optimised just fine. You can even do this: int foo(int a) { return a / 2; } uint foo(uint a) { return a / 2; } int bar(int a) { if (a > 0) return a / 2; else assert(0); } and get int example.foo(int): // slow, to deal with sign mov eax, edi shr eax, 31 add eax, edi sar eax ret uint example.foo(uint): // fast because no sign mov eax, edi shr eax ret int example.bar(int): // single shift because sign always 0 testedi, edi jle .L8 mov eax, edi sar eax ret .L8: ud2 This should help explain why the extra/different instructions are necessary for signed: int foo0(int a) { asm { naked; mov EAX, EDI; shr EAX, 31; add EAX, EDI; sar EAX, 1; ret; } } int foo1(int a) { asm { naked; mov EAX, EDI; sar EAX, 1; ret; } } int foo2(int a) { asm { naked; mov EAX, EDI; shr EAX, 1; ret; } } void main() { import std.stdio; foreach(i; [int.min, -1, 0, 1, int.max]) writeln(foo0(i), ' ', foo1(i), ' ', foo2(i)); } output: -1073741824 -1073741824 1073741824 0 -1 2147483647 0 0 0 0 0 0 1073741823 1073741823 1073741823
Re: Mir Random [WIP]
On 23.11.2016 00:55, Andrei Alexandrescu wrote: On 11/22/16 1:31 AM, Ilya Yaroshenko wrote: - `opCall` API instead of range interface is used (similar to C++) This seems like a gratuitous departure from common D practice. Random number generators are most naturally modeled in D as infinite ranges. -- Andrei I would posit that it is not actually natural to model random numbers as ranges. Not every random object is a sequence. Different samples are (ideally) independent, so there is little conceptual motivation to group them together -- it just increases the logistic overhead needed to get at the single samples. I.e., the most natural way to use a PRNG range is as follows: PRNG range; auto sample(){ auto r=range.front; range.popFront(); return r; } PRNGs are an example where global state is actually desirable, because in contrast to most other programming tasks, usually less predictability is good.
Re: Mir Random [WIP]
On Wednesday, 23 November 2016 at 16:54:44 UTC, Andrei Alexandrescu wrote: On 11/23/2016 10:52 AM, Ilya Yaroshenko wrote: I started with Engines as basis. The library will be very different comparing with Phobos and _any_ other RNG libraries in terms of floating point generation quality. All FP generation I have seen are not saturated (amount of possible unique FP values are very small comparing with ideal situation because of IEEE arithmetic). I have not found the idea described by others, so it may be an article in the future. Is this a design matter, or an implementation matter? Could you implement the superior FP generation on the existing std.random API? Real uniform `rand` (-2^^exp, +2^^exp) and real UniformVariable [a, b) was added. `rand` dose not use IEEE arithmetic to generate a real random number. --Ilya
Re: Mir Random [WIP]
On Thursday, 24 November 2016 at 08:59:03 UTC, Kagamin wrote: On Wednesday, 23 November 2016 at 19:40:47 UTC, Ilya Yaroshenko wrote: Current adaptor should be used in a function scope. (Would be great to have a DIP for that kind of semantic check). An RC adaptor can be added too. First we need to find a real world use case where we need to store a random range outside of a function. -- Ilya You want to instantiate and seed Mt every time you call a function that may need random numbers? A function can use a global RNG defined by a user or accepts RNG by reference. Range adaptor stores pointer, not value.
Re: Mir Random [WIP]
On 24.11.2016 09:55, Kagamin wrote: On Wednesday, 23 November 2016 at 21:18:27 UTC, Andrei Alexandrescu wrote: [Details needed] I just took a look https://godbolt.org/g/EMy6X4, it's happening. That's three instructions instead of one shr eax,1 That would be wrong code. Cast to int after dividing.
Re: Mir Random [WIP]
On Wednesday, 23 November 2016 at 21:33:53 UTC, Jonathan M Davis wrote: though I think that using the comma operator like that is deprecated now. Adding a helper function such as auto getNext(R)(ref R range) if(isInputRange!R) { range.popFront(); return range.front; } would solve that problem. It's the same behavior and suffers from the same problem of reuse of RNG output.
Re: Mir Random [WIP]
On Wednesday, 23 November 2016 at 19:40:47 UTC, Ilya Yaroshenko wrote: Current adaptor should be used in a function scope. (Would be great to have a DIP for that kind of semantic check). An RC adaptor can be added too. First we need to find a real world use case where we need to store a random range outside of a function. -- Ilya You want to instantiate and seed Mt every time you call a function that may need random numbers?
Re: Mir Random [WIP]
On Wednesday, 23 November 2016 at 21:18:27 UTC, Andrei Alexandrescu wrote: [Details needed] I just took a look https://godbolt.org/g/EMy6X4, it's happening. That's three instructions instead of one shr eax,1
Re: Mir Random [WIP]
On Wednesday, 23 November 2016 at 17:31:58 UTC, Kagamin wrote: On Wednesday, 23 November 2016 at 01:28:11 UTC, Andrei Alexandrescu wrote: Interesting. Could you please add a couple of links about that? -- Andrei http://xoroshiro.di.unimi.it/ Very short public domain implementation: http://xoroshiro.di.unimi.it/xoroshiro128plus.c
Re: Mir Random [WIP]
I have to agree that creating a different API that uses opCall or whatever instead of a the range API is a bad idea, particularly when a simple helper function would make it possible to use a random number generator in a fashion more like rand() for the cases where that's preferable, and for a lot of cases, having the range API is _extremely_ useful. - Jonathan M Davis Would, assuming the OpCall() interface, generate!(() => rng()) make an input range out of it easily? Or am I missing something?
Re: Mir Random [WIP]
On Wednesday, November 23, 2016 15:29:14 Kagamin via Digitalmars-d wrote: > On Wednesday, 23 November 2016 at 14:30:53 UTC, Andrei > > Alexandrescu wrote: > > This claim would apply to all ranges. There seems to be > > evidence it is unfounded. > > > > The main argument for using the range interface for RNGs is > > reuse of abstraction. Minute implementation matters are much > > less important. The main counter-argument is that the > > abstraction is not fitting well and another abstraction (such > > as opCall) is more advantageous. > > Consider this okayish looking code: > consume(rng()); > consume(rng.take(2)); //reuses previous value > consume(rng()); //discards unused value In the cases where I don't really need a range of random numbers, I've typically ended up doing stuff like auto nextNum = rndGen().popFront(), rndGen().front; though I think that using the comma operator like that is deprecated now. Adding a helper function such as auto getNext(R)(ref R range) if(isInputRange!R) { range.popFront(); return range.front; } would solve that problem. And there are plenty of cases where having a range of random numbers is extremely useful. After having dealt with, std.random, it would really suck to have to deal with a random number generator that was not range-based. std.random has problems such as not using reference types for its ranges, but using the range API in the way it does is extremely useful. - Jonathan M Davis
Re: Mir Random [WIP]
On Wednesday, November 23, 2016 08:54:30 Andrei Alexandrescu via Digitalmars-d wrote: > On 11/23/2016 06:14 AM, Ilya Yaroshenko wrote: > > I think that Range API is bad and useless overengineering for RNGs. > > So it either "takes 1 minute" to change the interface from opCall to > ranges (i.e. they're virtually the same thing), or it's the above. > There's precious little overengineering that can be done in 1 minute > > :o). I see you did that in a dozen lines in RandomRangeAdaptor. > > I understand you believe the range interface is unnecessary or overkill > for random number generators. I may even agree for certain cases. > However, there are a few arguments you may want to consider: > > * By virtue of working with D, everybody know what a range is. > Presenting the RNGs that way instantly evokes a host of knowledge, > understanding, and idioms. > > * D users (or at least a fraction) and several algorithms understand the > notion of an infinite range and make salient decisions based on that. A > range interface automatically taps into that. I've frequently found that in cases where I'm not operating on a range of values, I really just want a way to get a random number, and having to call both popFront and front to get it is a bit annoying (it's actually one of the few places that I've ever used the comma operator). That being said, I think that the correct solution to that is to add a function to std.random that takes a range, calls popFront on it, and then returns front so that it can work with any of the random number generators and avoids problems with whether or not popFront was called prior to calling front. I have to agree that creating a different API that uses opCall or whatever instead of a the range API is a bad idea, particularly when a simple helper function would make it possible to use a random number generator in a fashion more like rand() for the cases where that's preferable, and for a lot of cases, having the range API is _extremely_ useful. - Jonathan M Davis
Re: Mir Random [WIP]
On Wednesday, November 23, 2016 14:02:36 Joseph Rushton Wakeling via Digitalmars-d wrote: > On Wednesday, 23 November 2016 at 13:56:27 UTC, Andrei > > Alexandrescu wrote: > > Well I do see another small problem with > > https://github.com/libmir/mir-random/blob/master/source/random/algorithm > > .d#L19. It creates the first value eagerly upon construction. That's > > just a bit awkward. It's not the first time I'm seeing this issue, and > > to be honest am a bit bummed about it. It's a consequence of all ranges > > needing to buffer at least one element. -- Andrei > Yea, this touches on one of my more general concerns related to > ranges, randomness and laziness. Part of the trouble is we don't > have a really good general design for how to implement _truly_ > lazy ranges, where the value of `front` is not determined until > the point where you first access it. > > This has particular relevance to e.g. RandomSample and > RandomCover. It's quite feasible if you don't calculate front until it's called or popFront is called, and then you cache the result so that subsequent calls to front prior to the call to popFront return the same result, but it creates additional overhead, because then every call to front and popFront has to check whether the value has been calculated yet. And if every range in the chain of ranges has to do that, then that additional overhead is going to add up. In general, it's just cleaner to either calculate front on every call to front (which doesn't make sense in the case of random number generators) or to calculate the first front upon construction and be done with it. And I think that the general consensus has been that calculating front in the constructor and popFront and caching works better than calculating it on every call to front, but that doesn't always work (e.g. map), and that caching does cause issues occasionally. - Jonathan M Davis
Re: Mir Random [WIP]
On 11/23/16 3:05 PM, Ilya Yaroshenko wrote: On Wednesday, 23 November 2016 at 19:51:50 UTC, Andrei Alexandrescu wrote: On 11/23/16 2:40 PM, Ilya Yaroshenko wrote: On Wednesday, 23 November 2016 at 19:16:29 UTC, Andrei Alexandrescu wrote: A tip for the range adaptor: have it allocate and own the generator internally. That way it's easy to make it reference counted economically. Current adaptor should be used in a function scope. (Would be great to have a DIP for that kind of semantic check). An RC adaptor can be added too. First we need to find a real world use case where we need to store a random range outside of a function. -- Ilya Yah, problem here is we can't make that @safe. -- Andrei Can we improve D to make it safe? It's difficult in the general case for objective reasons (modular analysis). For this, we may be able to take an argument by ref, save its address inside, add some scope attributes. It's a project. -- Andrei
Re: Mir Random [WIP]
On 11/23/16 11:10 AM, Kagamin wrote: On Wednesday, 23 November 2016 at 14:30:53 UTC, Andrei Alexandrescu wrote: That seems to be a minor matter. Of course at best some measurements would be in order. Yes, it's just Joseph worries about microoptimizations. Though we saw that the compiler can't optimize some simple operations like division by 2. [Details needed] I just took a look https://godbolt.org/g/EMy6X4, it's happening. Andrei
Re: Mir Random [WIP]
On Wednesday, 23 November 2016 at 19:51:50 UTC, Andrei Alexandrescu wrote: On 11/23/16 2:40 PM, Ilya Yaroshenko wrote: On Wednesday, 23 November 2016 at 19:16:29 UTC, Andrei Alexandrescu wrote: A tip for the range adaptor: have it allocate and own the generator internally. That way it's easy to make it reference counted economically. Current adaptor should be used in a function scope. (Would be great to have a DIP for that kind of semantic check). An RC adaptor can be added too. First we need to find a real world use case where we need to store a random range outside of a function. -- Ilya Yah, problem here is we can't make that @safe. -- Andrei Can we improve D to make it safe?
Re: Mir Random [WIP]
On 11/23/16 2:40 PM, Ilya Yaroshenko wrote: On Wednesday, 23 November 2016 at 19:16:29 UTC, Andrei Alexandrescu wrote: A tip for the range adaptor: have it allocate and own the generator internally. That way it's easy to make it reference counted economically. Current adaptor should be used in a function scope. (Would be great to have a DIP for that kind of semantic check). An RC adaptor can be added too. First we need to find a real world use case where we need to store a random range outside of a function. -- Ilya Yah, problem here is we can't make that @safe. -- Andrei
Re: Mir Random [WIP]
On Wednesday, 23 November 2016 at 19:16:29 UTC, Andrei Alexandrescu wrote: A tip for the range adaptor: have it allocate and own the generator internally. That way it's easy to make it reference counted economically. Current adaptor should be used in a function scope. (Would be great to have a DIP for that kind of semantic check). An RC adaptor can be added too. First we need to find a real world use case where we need to store a random range outside of a function. -- Ilya
Re: Mir Random [WIP]
On 11/23/2016 01:13 PM, Ilya Yaroshenko wrote: 1. Engine must not have implicit copy semantic: it is not correct for RNGs and has performance issues. They also should not be copied explicitly in this example. OK, so (lack of) copyability is a good argument. I'm ready to live with random generators that are noncopyable values and use opCall; then use a range adaptor on top of a pointer to that. It seems providing the range primitives for a noncopyable object is not useful and is liable to create more confusion and frustration than a distinct API. A tip for the range adaptor: have it allocate and own the generator internally. That way it's easy to make it reference counted economically. Andrei
Re: Mir Random [WIP]
On Wednesday, 23 November 2016 at 18:13:52 UTC, Ilya Yaroshenko wrote: Assume your are building a modification Mod_X ~ 1 / X + X for a distribution. This is how it will look in Mir Random: EDIT: Mod_X ~ Y + X, Y ~ X. (X & Y are independent)
Re: Mir Random [WIP]
On Wednesday, 23 November 2016 at 16:54:44 UTC, Andrei Alexandrescu wrote: On 11/23/2016 10:52 AM, Ilya Yaroshenko wrote: I started with Engines as basis. The library will be very different comparing with Phobos and _any_ other RNG libraries in terms of floating point generation quality. All FP generation I have seen are not saturated (amount of possible unique FP values are very small comparing with ideal situation because of IEEE arithmetic). I have not found the idea described by others, so it may be an article in the future. Is this a design matter, or an implementation matter? Could you implement the superior FP generation on the existing std.random API? To not be able to vs. to not want to is a different matter; I have an appreciation for each, but we need to clarify. -- Andrei Floating point generation are implementation matter. Distributions and algorithms are design matter. Assume your are building a modification Mod_X ~ 1 / X + X for a distribution. This is how it will look in Mir Random: - struct Mod(D) if(isRandomVariable!D && isFloatingPoint!(ReturnType!D)) { D irv; alias irv this; ReturnType!D opCall(G)(ref G gen) if(isSURBG!D) { ReturnType!D x1 = void; do x1 = irv(gen); while(x1 == 0); ReturnType!D x2 = irv(gen); / / X1 and X2 are independent! /// return 1 / x1 + x2; } } unittest { auto gen = Xorshift(1); auto rv = Mod!GammaRandomVariable(...); auto x = rv(gen); /// generator is never copied by value. } - How it can be done with Range interface instead of opCall? Please note that users would use Distributions, not Engines. So, Range API requirements are: 1. Engine must not have implicit copy semantic: it is not correct for RNGs and has performance issues. They also should not be copied explicitly in this example. 2. Do not use Engine's pointers in RandomVariable, because RandomVariables can be passed easily outside of function: they are just a small tuples of params. 3. Do not use classes because of BetterC and performance issues. 4. Engines must have Range interface 5. RandomVariables (both Mod and an underlaying) must have Range interface. I don't think Range API for random variables can be done easily and without performance or security issues. Thanks, Ilya
Re: Mir Random [WIP]
On Wednesday, 23 November 2016 at 01:28:11 UTC, Andrei Alexandrescu wrote: Interesting. Could you please add a couple of links about that? -- Andrei http://xoroshiro.di.unimi.it/
Re: Mir Random [WIP]
On 11/23/2016 10:52 AM, Ilya Yaroshenko wrote: I started with Engines as basis. The library will be very different comparing with Phobos and _any_ other RNG libraries in terms of floating point generation quality. All FP generation I have seen are not saturated (amount of possible unique FP values are very small comparing with ideal situation because of IEEE arithmetic). I have not found the idea described by others, so it may be an article in the future. Is this a design matter, or an implementation matter? Could you implement the superior FP generation on the existing std.random API? To not be able to vs. to not want to is a different matter; I have an appreciation for each, but we need to clarify. -- Andrei
Re: Mir Random [WIP]
On Wednesday, 23 November 2016 at 14:30:53 UTC, Andrei Alexandrescu wrote: That seems to be a minor matter. Of course at best some measurements would be in order. Yes, it's just Joseph worries about microoptimizations. Though we saw that the compiler can't optimize some simple operations like division by 2.
Re: Mir Random [WIP]
On Wednesday, 23 November 2016 at 16:01:14 UTC, Ryan wrote: On Wednesday, 23 November 2016 at 15:44:10 UTC, Joseph Rushton Wakeling wrote: auto m = ().map!(a=>1); ...? (Untested, but I think it should work.) Tested, works with 2.071.1 r.randomRangeAdaptor.map!(a=>1);
Re: Mir Random [WIP]
On Wednesday, 23 November 2016 at 15:59:39 UTC, Kagamin wrote: On Wednesday, 23 November 2016 at 15:44:10 UTC, Joseph Rushton Wakeling wrote: auto m = ().map!(a=>1); ...? (Untested, but I think it should work.) Or RefRange https://issues.dlang.org/show_bug.cgi?id=10888 :) What the principal difference with randomRangeAdaptor? Anyway, noncopyable input range con not be used with Phobos. --Ilya
Re: Mir Random [WIP]
On Wednesday, 23 November 2016 at 15:44:10 UTC, Joseph Rushton Wakeling wrote: auto m = ().map!(a=>1); ...? (Untested, but I think it should work.) Tested, works with 2.071.1
Re: Mir Random [WIP]
On Wednesday, 23 November 2016 at 15:44:10 UTC, Joseph Rushton Wakeling wrote: auto m = ().map!(a=>1); ...? (Untested, but I think it should work.) Or RefRange https://issues.dlang.org/show_bug.cgi?id=10888 :)
Re: Mir Random [WIP]
On Wednesday, 23 November 2016 at 13:41:25 UTC, Andrei Alexandrescu wrote: On 11/23/2016 12:58 AM, Ilya Yaroshenko wrote: On Tuesday, 22 November 2016 at 23:55:01 UTC, Andrei Alexandrescu wrote: On 11/22/16 1:31 AM, Ilya Yaroshenko wrote: - `opCall` API instead of range interface is used (similar to C++) This seems like a gratuitous departure from common D practice. Random number generators are most naturally modeled in D as infinite ranges. -- Andrei It is safe low level architecture without performance and API issues. I don't understand this. Can you please be more specific? I don't see a major issue wrt offering opCall() vs. front/popFront. (empty is always true.) A range to use it with std.algorithm and std.range must be copyable (it is passed by value. It prevents users to do stupid things implicitly (like copying RNGs). An input range can be made noncopyable. Ditto. A noncopyable input range is useless. A hight level range interface can be added in the future (it will hold a _pointer_ to an RNG). Is there a reason to not have that now? Done. See `RandomRangeAdaptor`: https://github.com/libmir/mir-random/blob/master/source/random/algorithm.d In additional, when you need to write algorithms or distributions opCall is much more convenient than range API. Could you please be more specific? On the face of it I'd agree one call is less than two, but I don't see a major drawback here. The main reason in implementation simplicity. Engines should be simple to create, simple to maintain, and simple to use. opCall is more simple then range interface because 1. One declaration instead of 4 (3 range functions for plus latest generated value (optional)) 2. Input range is useless if range is not copyable. 3. `randomRangeAdaptor` is implemented for Engines and will be done for Distributions too. So range API is supported better then in std.range (because Engines are copied). In additions, users would not use Engine API in 99% cases: they will just want to call `rand` or `uniform`, or other distribution. I am sure that almost any library should have low level API that is fits to its implementation first. Addition API levels also may be added. Is there a large difference between opCall and front/popFront? Actually I can think of one - the matter of getting things started. Ranges have this awkwardness of starting the iteration: either you fill the current front eagerly in the constructor, or you have some sort of means to detect initialization has not yet been done and do it lazily upon the first use of front. The best strategy would depend on the actual generator, and admittedly would be a bit more of a headache compared to opCall. Was this the motivation? Simplicity is main motivation. ### Example of API+implementation bug: Bug: RNGs has min and max params (hello C++). But, they are not used when an uniform integer number is generated : `uniform!ulong` / `uniform!ulong(0, 100)`. Solution: In Mir Rundom any RNGs must generate all 8/16/32/64 bits uniformly. It is RNG problem how to do it. Min and max are not parameters, they are bounds provided by each generator. I agree their purpose is unclear. We could require all generators to provide min = 0 and max = UIntType.max without breaking APIs. In that case we only need to renounce LinearCongruentialEngine with c = 0 (see https://github.com/dlang/phobos/blob/master/std/random.d#L258) - in fact that's the main reason for introducing min and max in the first place. All other code stays unchanged, and we can easily deprecate min and max for RNGs. (I do see min and max used by uniform at https://github.com/dlang/phobos/blob/master/std/random.d#L1281 so I'm not sure I get what you mean, but anyhow the idea that we require RNGs to fill an uint/ulong with all random bits simplifies a lot of matters.) Current Mir solution looks like pair isURBG and isSURBG. `S` prefix means `T.max == ReturnType!T.max` where T is an Engine. So, functions use isSURBG now. The min property is not required: we can just subtract actual min from a returning value. An adaptor can be added to convert URBG to Saturated URBG. I will not fill this bug as well another dozen std.random bugs because the module should be rewritten anyway and I am working on it. std.random is a collection of bugs from C/C++ libraries extended with D generic idioms. For example, there is no reason in 64 bit Xorshift. It is 32 bit by design. Furthermore, 64 expansion of 32 bit algorithms must be proved theoretically before we allow it for end users. 64 bit analogs are exists, but they have another implementations. One matter that I see is there's precious little difference between mir.random and std.random. Much of the code seems copied, which is an inefficient way to go about things. We shouldn't fork everything if we don't like a bit of it, though admittedly the path toward making changes in std is
Re: Mir Random [WIP]
On Wednesday, 23 November 2016 at 14:30:53 UTC, Andrei Alexandrescu wrote: That seems to be a minor matter. Yes, it's just Joseph worries about microoptimizations. The main counter-argument is that the abstraction is not fitting well and another abstraction (such as opCall) is more advantageous. Ranges are designed for reuse of front, which is not desirable for RNGs.
Re: Mir Random [WIP]
On Wednesday, 23 November 2016 at 15:25:29 UTC, Kagamin wrote: struct A { bool empty; int front; void popFront(){} @disable this(this); } import std.algorithm; void f() { A r; auto m=r.map!(a=>1); } Does this compile for you? auto m = ().map!(a=>1); ...? (Untested, but I think it should work.)
Re: Mir Random [WIP]
On Wednesday, 23 November 2016 at 15:29:14 UTC, Kagamin wrote: On Wednesday, 23 November 2016 at 14:30:53 UTC, Andrei Consider this okayish looking code: consume(rng()); consume(rng.take(2)); //reuses previous value consume(rng()); //discards unused value Also consider the case of using 1 generator in your program, but calling it from different places. In one place, you use opCall with the popFront -> front order of calls, and in the other you use the range interface directly with the order reversed. You would re-use values there too. By offering both interfaces together it makes it pretty easy to create silly bugs like this, especially in a large project. Might be useful to have a basic RNG interface and then wrap it with a template that provides either, but not both, of the other interfaces at a time.
Re: Mir Random [WIP]
On Wednesday, 23 November 2016 at 13:41:25 UTC, Andrei Alexandrescu wrote: An input range can be made noncopyable. If you implement it as a noncopyable input range, you get a large support network working for you. struct A { bool empty; int front; void popFront(){} @disable this(this); } import std.algorithm; void f() { A r; auto m=r.map!(a=>1); } Does this compile for you?
Re: Mir Random [WIP]
On Wednesday, 23 November 2016 at 14:30:53 UTC, Andrei Alexandrescu wrote: This claim would apply to all ranges. There seems to be evidence it is unfounded. The main argument for using the range interface for RNGs is reuse of abstraction. Minute implementation matters are much less important. The main counter-argument is that the abstraction is not fitting well and another abstraction (such as opCall) is more advantageous. Consider this okayish looking code: consume(rng()); consume(rng.take(2)); //reuses previous value consume(rng()); //discards unused value
Re: Mir Random [WIP]
On 11/23/2016 09:12 AM, Kagamin wrote: xorshift128+ returns a temporary value computed during state update, so it can't efficiently separate font from popFront. That seems to be a minor matter. Of course at best some measurements would be in order. Also this API is prone to reuse some values and discard others, call popFront after front. This claim would apply to all ranges. There seems to be evidence it is unfounded. The main argument for using the range interface for RNGs is reuse of abstraction. Minute implementation matters are much less important. The main counter-argument is that the abstraction is not fitting well and another abstraction (such as opCall) is more advantageous. Andrei
Re: Mir Random [WIP]
On Wednesday, 23 November 2016 at 01:34:23 UTC, Andrei Alexandrescu wrote: We should add a reference RNG that transforms any other RNG into an input range that shares the actual RNG. Didn't we have a generic Ref!T wrapper in std.typecons that would work this way? Can't find it now.
Re: Mir Random [WIP]
On Wednesday, 23 November 2016 at 10:57:04 UTC, Joseph Rushton Wakeling wrote: struct MyRNG { void popFront() { /* update internal state */ } UIntType front() @property { return /* whatever part of internal state */; } UIntType opCall() { this.popFront(); return this.front; } } xorshift128+ returns a temporary value computed during state update, so it can't efficiently separate font from popFront. Also this API is prone to reuse some values and discard others, call popFront after front.
Re: Mir Random [WIP]
On Wednesday, 23 November 2016 at 13:56:27 UTC, Andrei Alexandrescu wrote: Well I do see another small problem with https://github.com/libmir/mir-random/blob/master/source/random/algorithm.d#L19. It creates the first value eagerly upon construction. That's just a bit awkward. It's not the first time I'm seeing this issue, and to be honest am a bit bummed about it. It's a consequence of all ranges needing to buffer at least one element. -- Andrei Yea, this touches on one of my more general concerns related to ranges, randomness and laziness. Part of the trouble is we don't have a really good general design for how to implement _truly_ lazy ranges, where the value of `front` is not determined until the point where you first access it. This has particular relevance to e.g. RandomSample and RandomCover.
Re: Mir Random [WIP]
On 11/23/2016 08:26 AM, Joseph Rushton Wakeling wrote: On Wednesday, 23 November 2016 at 13:03:04 UTC, Ilya Yaroshenko wrote: Added RandomRangeAdaptor for URBGs: https://github.com/libmir/mir-random/blob/master/source/random/algorithm.d This has exactly the problem I identified above, though: you're unnecessarily cacheing the latest variate rather than just using the RNG state directly. Not the biggest deal in the world, but avoidable if you allow a separation between updating RNG state and accessing it. Well I do see another small problem with https://github.com/libmir/mir-random/blob/master/source/random/algorithm.d#L19. It creates the first value eagerly upon construction. That's just a bit awkward. It's not the first time I'm seeing this issue, and to be honest am a bit bummed about it. It's a consequence of all ranges needing to buffer at least one element. -- Andrei
Re: Mir Random [WIP]
On Wednesday, 23 November 2016 at 13:44:00 UTC, Andrei Alexandrescu wrote: Well we need to get to the bottom of this if we're to make progress. Otherwise it's copypasta with little changes followed by disappointment all the way down. -- Andrei Would you be up for a direct discussion of this some time -- maybe next week once I've had time to re-review the fine details of my concerns? I can provide a summary of issues as a starting point for that discussion, if it would be useful. I suggest this because (i) I'm interested in your insight and (ii) even if we end up disagreeing on concerns and solutions, it would be good to make sure that we're on the same page in terms of understanding each other's point of view.
Re: Mir Random [WIP]
On 11/23/2016 06:14 AM, Ilya Yaroshenko wrote: I think that Range API is bad and useless overengineering for RNGs. So it either "takes 1 minute" to change the interface from opCall to ranges (i.e. they're virtually the same thing), or it's the above. There's precious little overengineering that can be done in 1 minute :o). I see you did that in a dozen lines in RandomRangeAdaptor. I understand you believe the range interface is unnecessary or overkill for random number generators. I may even agree for certain cases. However, there are a few arguments you may want to consider: * By virtue of working with D, everybody know what a range is. Presenting the RNGs that way instantly evokes a host of knowledge, understanding, and idioms. * D users (or at least a fraction) and several algorithms understand the notion of an infinite range and make salient decisions based on that. A range interface automatically taps into that. Andrei
Re: Mir Random [WIP]
On Wednesday, 23 November 2016 at 13:35:33 UTC, Ilya Yaroshenko wrote: 1. Current default RNG (Mt19937) has not state for the latest value. That's a detail of your implementation, though, and it's not true for lots of other pseudo-RNGs. 2. The structure is allocated on stack and compilers can recognise loop patterns and eliminate addition memory movements for many cases. Fair enough. I'm still not sure, though, why you would object to providing public methods for pseudo-RNGs to (i) update the internal state and (ii) access the most recently generated variate, which the `opCall` would then wrap.
Re: Mir Random [WIP]
On 11/23/2016 05:47 AM, Joseph Rushton Wakeling wrote: On Wednesday, 23 November 2016 at 01:34:23 UTC, Andrei Alexandrescu wrote: I'm unclear on what that statistically unsafe default behavior is - my understanding is it has to do with RNGs being inadvertently copied. It would be great to formalize that in a well-explained issue. I'll see if I can write that up in depth some time soon. TBH though I think the problem is less about RNGs and more about stuff like RandomSample and RandomCover (and, in future, random distributions that have internal state, like a struct implementing a normal distribution using the Ziggurat algorithm internally). It's not so difficult to stop RNG state being copied inadvertently, but when you have ranges wrapping ranges wrapping ranges, each containing their own extra state that cannot be copied by value, things get a bit more complicated. Well we need to get to the bottom of this if we're to make progress. Otherwise it's copypasta with little changes followed by disappointment all the way down. -- Andrei
Re: Mir Random [WIP]
On 11/23/2016 01:16 AM, Ilya Yaroshenko wrote: The std.range is good example of degradation, it has a lot of API and implementation bugs. EDIT: std.range -> std.random Oh, that narrows it down. Thx. -- Andrei
Re: Mir Random [WIP]
On 11/23/2016 05:27 AM, Joseph Rushton Wakeling wrote: Alternative solution: functionality that expects the full unsigned integer word (UIntType) to be filled with random bits, should validate that the min/max values of the generator correspond to UIntType.min and UIntType.max. That introduces less breaking change, creates less divergence with the C++11 standard, and preserves more flexibility for the future. Good idea - static assert is your friend. -- Andrei
Re: Mir Random [WIP]
On 11/23/2016 12:58 AM, Ilya Yaroshenko wrote: On Tuesday, 22 November 2016 at 23:55:01 UTC, Andrei Alexandrescu wrote: On 11/22/16 1:31 AM, Ilya Yaroshenko wrote: - `opCall` API instead of range interface is used (similar to C++) This seems like a gratuitous departure from common D practice. Random number generators are most naturally modeled in D as infinite ranges. -- Andrei It is safe low level architecture without performance and API issues. I don't understand this. Can you please be more specific? I don't see a major issue wrt offering opCall() vs. front/popFront. (empty is always true.) It prevents users to do stupid things implicitly (like copying RNGs). An input range can be made noncopyable. A hight level range interface can be added in the future (it will hold a _pointer_ to an RNG). Is there a reason to not have that now? Again, I'm literally talking about offering front/popFront in lieu of opCall(). The only implementation difference is you keep the currently-generated number as a member instead of returning it from opCall. I doubt one could measure a performance difference. If you implement it as a noncopyable input range, you get a large support network working for you. With opCall, we have virtually no such support - you need to do everything once again. In additional, when you need to write algorithms or distributions opCall is much more convenient than range API. Could you please be more specific? On the face of it I'd agree one call is less than two, but I don't see a major drawback here. In additions, users would not use Engine API in 99% cases: they will just want to call `rand` or `uniform`, or other distribution. I am sure that almost any library should have low level API that is fits to its implementation first. Addition API levels also may be added. Is there a large difference between opCall and front/popFront? Actually I can think of one - the matter of getting things started. Ranges have this awkwardness of starting the iteration: either you fill the current front eagerly in the constructor, or you have some sort of means to detect initialization has not yet been done and do it lazily upon the first use of front. The best strategy would depend on the actual generator, and admittedly would be a bit more of a headache compared to opCall. Was this the motivation? Current Phobos evolution is generic degradation: more generic and "universal" code hide more uncovered bugs in the code. The std.range is good example of degradation, it has a lot of API and implementation bugs. Do you have examples of issues outside random number generators? ### Example of API+implementation bug: Bug: RNGs has min and max params (hello C++). But, they are not used when an uniform integer number is generated : `uniform!ulong` / `uniform!ulong(0, 100)`. Solution: In Mir Rundom any RNGs must generate all 8/16/32/64 bits uniformly. It is RNG problem how to do it. Min and max are not parameters, they are bounds provided by each generator. I agree their purpose is unclear. We could require all generators to provide min = 0 and max = UIntType.max without breaking APIs. In that case we only need to renounce LinearCongruentialEngine with c = 0 (see https://github.com/dlang/phobos/blob/master/std/random.d#L258) - in fact that's the main reason for introducing min and max in the first place. All other code stays unchanged, and we can easily deprecate min and max for RNGs. (I do see min and max used by uniform at https://github.com/dlang/phobos/blob/master/std/random.d#L1281 so I'm not sure I get what you mean, but anyhow the idea that we require RNGs to fill an uint/ulong with all random bits simplifies a lot of matters.) I will not fill this bug as well another dozen std.random bugs because the module should be rewritten anyway and I am working on it. std.random is a collection of bugs from C/C++ libraries extended with D generic idioms. For example, there is no reason in 64 bit Xorshift. It is 32 bit by design. Furthermore, 64 expansion of 32 bit algorithms must be proved theoretically before we allow it for end users. 64 bit analogs are exists, but they have another implementations. One matter that I see is there's precious little difference between mir.random and std.random. Much of the code seems copied, which is an inefficient way to go about things. We shouldn't fork everything if we don't like a bit of it, though admittedly the path toward making changes in std is more difficult. Is your intent to work on mir.random on the side and then submit it as a wholesale replacement of std.random under a different name? In that case you'd have my support, but you'd need to convince me the replacement is necessary. You'd probably have a good case for eliminating xorshift/64, but then we may simply deprecate that outright. You'd possibly have a more difficult time with opCall. Phobos degrades because we add a lot of
Re: Mir Random [WIP]
On Wednesday, 23 November 2016 at 13:01:22 UTC, Andrea Fontana wrote: On Tuesday, 22 November 2016 at 06:31:45 UTC, Ilya Yaroshenko wrote: - 64-bit Mt19937 initialization is fixed - 64-bit Mt19937 is default for 64-bit targets I wonder why Mersenne Twister is *always* choosen over other algorithms. My vote goes for CMWC, of course. Andrea A PR for CMWC is highly welcome!
Re: Mir Random [WIP]
On Wednesday, 23 November 2016 at 13:26:41 UTC, Joseph Rushton Wakeling wrote: On Wednesday, 23 November 2016 at 13:03:04 UTC, Ilya Yaroshenko wrote: Added RandomRangeAdaptor for URBGs: https://github.com/libmir/mir-random/blob/master/source/random/algorithm.d This has exactly the problem I identified above, though: you're unnecessarily cacheing the latest variate rather than just using the RNG state directly. Not the biggest deal in the world, but avoidable if you allow a separation between updating RNG state and accessing it. 1. Current default RNG (Mt19937) has not state for the latest value. 2. The structure is allocated on stack and compilers can recognise loop patterns and eliminate addition memory movements for many cases.
Re: Mir Random [WIP]
On Wednesday, 23 November 2016 at 13:01:22 UTC, Andrea Fontana wrote: I wonder why Mersenne Twister is *always* choosen over other algorithms. The weight of history, I suspect. Mersenne Twister was the major new high-quality RNG back when people started getting really concerned about having good defaults, and when the Diehard Tests were the state of the art in tests of randomness. IIRC there's also a benefit in terms of dimensionality, which some more recent generators don't address, which can make it a safer "all-round default". Agree that there are probably better choices for today, though.
Re: Mir Random [WIP]
On Wednesday, 23 November 2016 at 13:03:04 UTC, Ilya Yaroshenko wrote: Added RandomRangeAdaptor for URBGs: https://github.com/libmir/mir-random/blob/master/source/random/algorithm.d This has exactly the problem I identified above, though: you're unnecessarily cacheing the latest variate rather than just using the RNG state directly. Not the biggest deal in the world, but avoidable if you allow a separation between updating RNG state and accessing it.
Re: Mir Random [WIP]
On Wednesday, 23 November 2016 at 11:44:44 UTC, Joseph Rushton Wakeling wrote: Yes, most uses of RNGs in std.random involve calling `front` and then `popFront()` (although it would probably be better the other way round). But it's readily possible to imagine range-based use-cases for random distributions along the lines of, myRNG.normalDistribution(0.0, 5.0).filter!(a => a > 0).somethingElse.take(20); But what I'd say more broadly is that of what I've seen so far, mir.random is conflating too many breaking changes without giving thought for their impact (for example, converting the `isUniformRNG` check to rely on a UDA is IMO problematic; I can file a GitHub issue explaining the reasons for this). Allowing for the wider goals of the exercise, it could be worth giving some thought to which of these breakages is really needed to support your use-cases, and which can be avoided. Added RandomRangeAdaptor for URBGs: https://github.com/libmir/mir-random/blob/master/source/random/algorithm.d
Re: Mir Random [WIP]
On Tuesday, 22 November 2016 at 06:31:45 UTC, Ilya Yaroshenko wrote: - 64-bit Mt19937 initialization is fixed - 64-bit Mt19937 is default for 64-bit targets I wonder why Mersenne Twister is *always* choosen over other algorithms. My vote goes for CMWC, of course. Andrea
Re: Mir Random [WIP]
On Wednesday, 23 November 2016 at 11:44:44 UTC, Joseph Rushton Wakeling wrote: On Wednesday, 23 November 2016 at 11:14:38 UTC, Ilya Yaroshenko wrote: For example, Mir Random basic utilities (more low level then distributions): https://github.com/libmir/mir-random/blob/master/source/random/package.d Also you can explore std.random code. opCall would almost always more convenient to use. Yes, but as described, `opCall` can easily be created as a composition of `popFront` and `front` for these convenience purposes. We don't restrict. It is 1 minute to write an Range wrapper. If all you have is `opCall` then the range wrapper is less efficient than it need be. Inlining will solve this problem with data duplication (if any) for most real world cases. This wrapper can be added to the library, if we found a real world use case. Again, I have not seen a real world algorithm, which looks better with Range API for generators. RandomSample can be created without Range API, and it would look more convenient then it is now. I think that Range API is bad and useless overengineering for RNGs. Yes, most uses of RNGs in std.random involve calling `front` and then `popFront()` (although it would probably be better the other way round). But it's readily possible to imagine range-based use-cases for random distributions along the lines of, myRNG.normalDistribution(0.0, 5.0).filter!(a => a > 0).somethingElse.take(20); But what I'd say more broadly is that of what I've seen so far, mir.random is conflating too many breaking changes without giving thought for their impact (for example, converting the `isUniformRNG` check to rely on a UDA is IMO problematic; I can file a GitHub issue explaining the reasons for this). Allowing for the wider goals of the exercise, it could be worth giving some thought to which of these breakages is really needed to support your use-cases, and which can be avoided. Yes, please file a GitHub issue.
Re: Mir Random [WIP]
On Wednesday, 23 November 2016 at 11:14:38 UTC, Ilya Yaroshenko wrote: For example, Mir Random basic utilities (more low level then distributions): https://github.com/libmir/mir-random/blob/master/source/random/package.d Also you can explore std.random code. opCall would almost always more convenient to use. Yes, but as described, `opCall` can easily be created as a composition of `popFront` and `front` for these convenience purposes. We don't restrict. It is 1 minute to write an Range wrapper. If all you have is `opCall` then the range wrapper is less efficient than it need be. This wrapper can be added to the library, if we found a real world use case. Again, I have not seen a real world algorithm, which looks better with Range API for generators. RandomSample can be created without Range API, and it would look more convenient then it is now. I think that Range API is bad and useless overengineering for RNGs. Yes, most uses of RNGs in std.random involve calling `front` and then `popFront()` (although it would probably be better the other way round). But it's readily possible to imagine range-based use-cases for random distributions along the lines of, myRNG.normalDistribution(0.0, 5.0).filter!(a => a > 0).somethingElse.take(20); But what I'd say more broadly is that of what I've seen so far, mir.random is conflating too many breaking changes without giving thought for their impact (for example, converting the `isUniformRNG` check to rely on a UDA is IMO problematic; I can file a GitHub issue explaining the reasons for this). Allowing for the wider goals of the exercise, it could be worth giving some thought to which of these breakages is really needed to support your use-cases, and which can be avoided.
Re: Mir Random [WIP]
On Wednesday, 23 November 2016 at 10:57:04 UTC, Joseph Rushton Wakeling wrote: On Wednesday, 23 November 2016 at 05:58:47 UTC, Ilya Yaroshenko wrote: It is safe low level architecture without performance and API issues. It prevents users to do stupid things implicitly (like copying RNGs). A hight level range interface can be added in the future (it will hold a _pointer_ to an RNG). Note that if you want to do this, it's convenient to still preserve a separation between popping the RNG state versus getting the current variate. Otherwise, the range interface will wind up having to separately cache the variate value, which is wasteful. Something like this: struct MyRNG { void popFront() { /* update internal state */ } UIntType front() @property { return /* whatever part of internal state */; } UIntType opCall() { this.popFront(); return this.front; } } (The above is basically just the input range API less the `empty` property, and the `front` and `popFront()` are arguably a lower-level API than `opCall`.) In additional, when you need to write algorithms or distributions opCall is much more convenient than range API. Can you give some examples of what you mean here? For example, Mir Random basic utilities (more low level then distributions): https://github.com/libmir/mir-random/blob/master/source/random/package.d Also you can explore std.random code. opCall would almost always more convenient to use. In additions, users would not use Engine API in 99% cases: they will just want to call `rand` or `uniform`, or other distribution. I don't think that's necessarily true, but in any case, we shouldn't restrict the use-cases of the 1% unless we have to. We don't restrict. It is 1 minute to write an Range wrapper. This wrapper can be added to the library, if we found a real world use case. Again, I have not seen a real world algorithm, which looks better with Range API for generators. RandomSample can be created without Range API, and it would look more convenient then it is now. I think that Range API is bad and useless overengineering for RNGs.
Re: Mir Random [WIP]
On Wednesday, 23 November 2016 at 11:03:33 UTC, Ilya Yaroshenko wrote: It is only question of `Random` alias, which can be changed in the future anyway. Both Mt19937_32 and Mt19937_64 are defined. I think we're at an impasse in terms of priorities, because that's exactly the reason that I think you should leave the Random alias pointing to the same generator, and let the people with speed/quality concerns select the alternative generator ;-)
Re: Mir Random [WIP]
On Wednesday, 23 November 2016 at 10:33:21 UTC, Joseph Rushton Wakeling wrote: On Wednesday, 23 November 2016 at 05:26:12 UTC, Ilya Yaroshenko wrote: [...] All of which is fine in its own terms, but why prioritize the speed of the default behaviour over its reliability and reproducibility? Anyone who cares about that combination of speed and statistical quality will have enough information in the documentation to know what to do. By contrast producing different results for identical user code depending on whether you're making a 32- or 64-bit build is an unpleasant complication it could be better to avoid. In any case, given what you say above, shouldn't the choice of 32- vs. 64-bit RNG depend on whether one is using a distribution that generates 32- vs. 64-bit floating-point variates, rather than whether one is building for a 32- or 64-bit target? In which case it needs to be a user choice, since it's only the user who knows what distribution they're using. We have a Random alias. I think it is OK if it generates different numbers for different arch and library versions. If a user want to exact the same behaviour he can use explicit names like Mt19937_32 and Mt19937_64. Both are defined for all architectures. 64-bit has not significant speed degradation on 64-bit machines for 32-bit random number generation. Maybe only few %. So it is better for 64-bit machines. It is only question of `Random` alias, which can be changed in the future anyway. Both Mt19937_32 and Mt19937_64 are defined.
Re: Mir Random [WIP]
On Wednesday, 23 November 2016 at 05:58:47 UTC, Ilya Yaroshenko wrote: It is safe low level architecture without performance and API issues. It prevents users to do stupid things implicitly (like copying RNGs). A hight level range interface can be added in the future (it will hold a _pointer_ to an RNG). Note that if you want to do this, it's convenient to still preserve a separation between popping the RNG state versus getting the current variate. Otherwise, the range interface will wind up having to separately cache the variate value, which is wasteful. Something like this: struct MyRNG { void popFront() { /* update internal state */ } UIntType front() @property { return /* whatever part of internal state */; } UIntType opCall() { this.popFront(); return this.front; } } (The above is basically just the input range API less the `empty` property, and the `front` and `popFront()` are arguably a lower-level API than `opCall`.) In additional, when you need to write algorithms or distributions opCall is much more convenient than range API. Can you give some examples of what you mean here? In additions, users would not use Engine API in 99% cases: they will just want to call `rand` or `uniform`, or other distribution. I don't think that's necessarily true, but in any case, we shouldn't restrict the use-cases of the 1% unless we have to.
Re: Mir Random [WIP]
On Wednesday, 23 November 2016 at 01:34:23 UTC, Andrei Alexandrescu wrote: I'm unclear on what that statistically unsafe default behavior is - my understanding is it has to do with RNGs being inadvertently copied. It would be great to formalize that in a well-explained issue. I'll see if I can write that up in depth some time soon. TBH though I think the problem is less about RNGs and more about stuff like RandomSample and RandomCover (and, in future, random distributions that have internal state, like a struct implementing a normal distribution using the Ziggurat algorithm internally). It's not so difficult to stop RNG state being copied inadvertently, but when you have ranges wrapping ranges wrapping ranges, each containing their own extra state that cannot be copied by value, things get a bit more complicated.
Re: Mir Random [WIP]
On Wednesday, 23 November 2016 at 05:26:12 UTC, Ilya Yaroshenko wrote: Mir Random is going to be a library with saturated uniform FP RNGs and almost saturated exponential FP RNGs. Comparing with all other libraries (any language) the basic uniform FP numbers will be generated in interval (-1, +1) and contains _all_ possible values including all subnormal numbers. 64 bit generators are 2 times faster for this task if you need to generate a 64 bit floating point number. Explanation of technique will be in my post/article. --Ilya All of which is fine in its own terms, but why prioritize the speed of the default behaviour over its reliability and reproducibility? Anyone who cares about that combination of speed and statistical quality will have enough information in the documentation to know what to do. By contrast producing different results for identical user code depending on whether you're making a 32- or 64-bit build is an unpleasant complication it could be better to avoid. In any case, given what you say above, shouldn't the choice of 32- vs. 64-bit RNG depend on whether one is using a distribution that generates 32- vs. 64-bit floating-point variates, rather than whether one is building for a 32- or 64-bit target? In which case it needs to be a user choice, since it's only the user who knows what distribution they're using.
Re: Mir Random [WIP]
On Wednesday, 23 November 2016 at 10:27:00 UTC, Joseph Rushton Wakeling wrote: On Wednesday, 23 November 2016 at 05:58:47 UTC, Ilya Yaroshenko wrote: ### Example of API+implementation bug: Bug: RNGs has min and max params (hello C++). But, they are not used when an uniform integer number is generated : `uniform!ulong` / `uniform!ulong(0, 100)`. Solution: In Mir Rundom any RNGs must generate all 8/16/32/64 bits uniformly. It is RNG problem how to do it. Alternative solution: functionality that expects the full unsigned integer word (UIntType) to be filled with random bits, should validate that the min/max values of the generator correspond to UIntType.min and UIntType.max. That introduces less breaking change, creates less divergence with the C++11 standard, and preserves more flexibility for the future. Good point, will add this. --Ilya
Re: Mir Random [WIP]
On Wednesday, 23 November 2016 at 05:58:47 UTC, Ilya Yaroshenko wrote: ### Example of API+implementation bug: Bug: RNGs has min and max params (hello C++). But, they are not used when an uniform integer number is generated : `uniform!ulong` / `uniform!ulong(0, 100)`. Solution: In Mir Rundom any RNGs must generate all 8/16/32/64 bits uniformly. It is RNG problem how to do it. Alternative solution: functionality that expects the full unsigned integer word (UIntType) to be filled with random bits, should validate that the min/max values of the generator correspond to UIntType.min and UIntType.max. That introduces less breaking change, creates less divergence with the C++11 standard, and preserves more flexibility for the future.
Re: Mir Random [WIP]
On Wednesday, 23 November 2016 at 05:58:47 UTC, Ilya Yaroshenko wrote: On Tuesday, 22 November 2016 at 23:55:01 UTC, Andrei Alexandrescu wrote: On 11/22/16 1:31 AM, Ilya Yaroshenko wrote: - `opCall` API instead of range interface is used (similar to C++) This seems like a gratuitous departure from common D practice. Random number generators are most naturally modeled in D as infinite ranges. -- Andrei It is safe low level architecture without performance and API issues. It prevents users to do stupid things implicitly (like copying RNGs). A hight level range interface can be added in the future (it will hold a _pointer_ to an RNG). In additional, when you need to write algorithms or distributions opCall is much more convenient than range API. In additions, users would not use Engine API in 99% cases: they will just want to call `rand` or `uniform`, or other distribution. I am sure that almost any library should have low level API that is fits to its implementation first. Addition API levels also may be added. Current Phobos evolution is generic degradation: more generic and "universal" code hide more uncovered bugs in the code. The std.range is good example of degradation, it has a lot of API and implementation bugs. EDIT: std.range -> std.random
Re: Mir Random [WIP]
On Tuesday, 22 November 2016 at 23:55:01 UTC, Andrei Alexandrescu wrote: On 11/22/16 1:31 AM, Ilya Yaroshenko wrote: - `opCall` API instead of range interface is used (similar to C++) This seems like a gratuitous departure from common D practice. Random number generators are most naturally modeled in D as infinite ranges. -- Andrei It is safe low level architecture without performance and API issues. It prevents users to do stupid things implicitly (like copying RNGs). A hight level range interface can be added in the future (it will hold a _pointer_ to an RNG). In additional, when you need to write algorithms or distributions opCall is much more convenient than range API. In additions, users would not use Engine API in 99% cases: they will just want to call `rand` or `uniform`, or other distribution. I am sure that almost any library should have low level API that is fits to its implementation first. Addition API levels also may be added. Current Phobos evolution is generic degradation: more generic and "universal" code hide more uncovered bugs in the code. The std.range is good example of degradation, it has a lot of API and implementation bugs. ### Example of API+implementation bug: Bug: RNGs has min and max params (hello C++). But, they are not used when an uniform integer number is generated : `uniform!ulong` / `uniform!ulong(0, 100)`. Solution: In Mir Rundom any RNGs must generate all 8/16/32/64 bits uniformly. It is RNG problem how to do it. I will not fill this bug as well another dozen std.random bugs because the module should be rewritten anyway and I am working on it. std.random is a collection of bugs from C/C++ libraries extended with D generic idioms. For example, there is no reason in 64 bit Xorshift. It is 32 bit by design. Furthermore, 64 expansion of 32 bit algorithms must be proved theoretically before we allow it for end users. 64 bit analogs are exists, but they have another implementations. Phobos degrades because we add a lot of generic specializations and small utilities without understanding use cases. Phobos really follows stupid idealistic idea: more generic is better, more API is better, more universal algorithms is better. The problems is that Phobos/DRuntime is soup where all (because its "universality") interacts with everything.
Re: Mir Random [WIP]
On Wednesday, 23 November 2016 at 00:44:26 UTC, Joseph Rushton Wakeling wrote: On Tuesday, 22 November 2016 at 06:31:45 UTC, Ilya Yaroshenko wrote: - 64-bit Mt19937 is default for 64-bit targets This means that seemingly identical code will produce different results depending on whether it's compiled for 64-bit or 32-bit. Is that really worth it, when anyone who cares about the difference between 32-bit vs. 64-bit random words is quite capable of specifying the RNG they want to use and not just relying on the default? Having a different default RNG would make sense for targets where there are serious performance issues at stake (e.g. minimal memory available for RNG state) but just for the sake of 32- vs. 64-bit Mersenne Twister seems an unnecessary complication. These days it's debatable whether Mersenne Twister of _any_ word size is the optimal choice for a default RNG, so if the default is going to be changed, it might as well be to something significantly better (rather than worrying about numbers of bits). Mir Random is going to be a library with saturated uniform FP RNGs and almost saturated exponential FP RNGs. Comparing with all other libraries (any language) the basic uniform FP numbers will be generated in interval (-1, +1) and contains _all_ possible values including all subnormal numbers. 64 bit generators are 2 times faster for this task if you need to generate a 64 bit floating point number. Explanation of technique will be in my post/article. --Ilya
Re: Mir Random [WIP]
On Wednesday, 23 November 2016 at 01:28:11 UTC, Andrei Alexandrescu wrote: On 11/22/16 7:44 PM, Joseph Rushton Wakeling wrote: These days it's debatable whether Mersenne Twister of _any_ word size is the optimal choice for a default RNG Interesting. Could you please add a couple of links about that? -- Andrei http://www.pcg-random.org/other-rngs.html ;-)
Re: Mir Random [WIP]
On 11/22/16 7:30 PM, John Colvin wrote: On Tuesday, 22 November 2016 at 23:55:01 UTC, Andrei Alexandrescu wrote: On 11/22/16 1:31 AM, Ilya Yaroshenko wrote: - `opCall` API instead of range interface is used (similar to C++) This seems like a gratuitous departure from common D practice. Random number generators are most naturally modeled in D as infinite ranges. -- Andrei I'm pretty sure everyone *wants* it to be a range, but it turns out it's a difficult design space. Lot's of nasty trade-offs that can be (and have been) argued back and forth depending on your priorities. The proposed design has disabled copy ctor and uses opCall() for a new element. That seems to be a difference without a distinction from an input range that has disabled copy ctor and offers the input range primitives. Perhaps you have an insight that has been missed that can make a good rng range without causing less than optimal performance or statistically unsafe default behaviour? We should add a reference RNG that transforms any other RNG into an input range that shares the actual RNG. IIRC you think people are making too much fuss about the statistically safe default behaviour, but it would be interesting to hear a more expanded version of that view. I'm unclear on what that statistically unsafe default behavior is - my understanding is it has to do with RNGs being inadvertently copied. It would be great to formalize that in a well-explained issue. Andrei
Re: Mir Random [WIP]
On 11/22/16 7:36 PM, Joseph Rushton Wakeling wrote: * random _generators_, i.e. sources of uniformly distributed random bits: - random _engines_ (seedable, pseudo-random algorithms) - random _devices_ (non-deterministic sources of uniformly distributed bits) * random _distributions_, which transform uniformly-distributed random bits into variates with some other type and distribution - note _this includes uniformly-distributed integers_! - also uniformly-distributed floats, enums, etc. - and also non-uniform distributions * random _algorithms_, i.e. algorithms in the sense of std.random, but where their popFront() includes random decision-making - randomCover, randomSample, etc. Yah, I think that would be nice. -- Andrei
Re: Mir Random [WIP]
On 11/22/16 7:44 PM, Joseph Rushton Wakeling wrote: These days it's debatable whether Mersenne Twister of _any_ word size is the optimal choice for a default RNG Interesting. Could you please add a couple of links about that? -- Andrei
Re: Mir Random [WIP]
On Tuesday, 22 November 2016 at 06:31:45 UTC, Ilya Yaroshenko wrote: - 64-bit Mt19937 is default for 64-bit targets This means that seemingly identical code will produce different results depending on whether it's compiled for 64-bit or 32-bit. Is that really worth it, when anyone who cares about the difference between 32-bit vs. 64-bit random words is quite capable of specifying the RNG they want to use and not just relying on the default? Having a different default RNG would make sense for targets where there are serious performance issues at stake (e.g. minimal memory available for RNG state) but just for the sake of 32- vs. 64-bit Mersenne Twister seems an unnecessary complication. These days it's debatable whether Mersenne Twister of _any_ word size is the optimal choice for a default RNG, so if the default is going to be changed, it might as well be to something significantly better (rather than worrying about numbers of bits).
Re: Mir Random [WIP]
On Tuesday, 22 November 2016 at 06:31:45 UTC, Ilya Yaroshenko wrote: # Integer uniform generators [WIP] # Real uniform generators [WIP] # Nonuniform generators [WIP] As we discussed in relation to Seb's project, I think this is a problematic conceptualization of the best way to structure functionality related to randomness. An arguably better way (as outlined in the C++11 standard) is to think in terms of: * random _generators_, i.e. sources of uniformly distributed random bits: - random _engines_ (seedable, pseudo-random algorithms) - random _devices_ (non-deterministic sources of uniformly distributed bits) * random _distributions_, which transform uniformly-distributed random bits into variates with some other type and distribution - note _this includes uniformly-distributed integers_! - also uniformly-distributed floats, enums, etc. - and also non-uniform distributions * random _algorithms_, i.e. algorithms in the sense of std.random, but where their popFront() includes random decision-making - randomCover, randomSample, etc. The point of the above breakdown is that it gives a nice and clear separation of concerns that allows for easily replaceable components. Separating out stuff just by the ultimate result you want (integers vs. floats, uniform vs. non-uniform, etc.) isn't helpful in that way.
Re: Mir Random [WIP]
On Tuesday, 22 November 2016 at 23:55:01 UTC, Andrei Alexandrescu wrote: On 11/22/16 1:31 AM, Ilya Yaroshenko wrote: - `opCall` API instead of range interface is used (similar to C++) This seems like a gratuitous departure from common D practice. Random number generators are most naturally modeled in D as infinite ranges. -- Andrei I'm pretty sure everyone *wants* it to be a range, but it turns out it's a difficult design space. Lot's of nasty trade-offs that can be (and have been) argued back and forth depending on your priorities. Perhaps you have an insight that has been missed that can make a good rng range without causing less than optimal performance or statistically unsafe default behaviour? IIRC you think people are making too much fuss about the statistically safe default behaviour, but it would be interesting to hear a more expanded version of that view.
Re: Mir Random [WIP]
On Tuesday, 22 November 2016 at 23:55:01 UTC, Andrei Alexandrescu wrote: On 11/22/16 1:31 AM, Ilya Yaroshenko wrote: - `opCall` API instead of range interface is used (similar to C++) This seems like a gratuitous departure from common D practice. Random number generators are most naturally modeled in D as infinite ranges. -- Andrei Yes, I think this is avoiding the existing problems with RNGs and ranges rather than solving them. I don't blame anyone for _wanting_ to avoid them; they are nasty, subtle issues that seem to keep getting more complex the more one looks at them (for example, after my DConf talk last year, I realized that there were a whole set of other potential complications related to how ranges typically treat laziness). But I think they can be solved, and should be. OTOH, there's no reason per se why there should not be an `opCall` for random number generators along the lines of, UIntType opCall() { this.popFront(); return this.front; } ... just to provide options to the user. (BTW, note the order there, which touches on the issues related to what lazy evaluation means not just for RNGs but for any non-deterministic IO.)
Re: Mir Random [WIP]
On 11/22/16 1:31 AM, Ilya Yaroshenko wrote: - `opCall` API instead of range interface is used (similar to C++) This seems like a gratuitous departure from common D practice. Random number generators are most naturally modeled in D as infinite ranges. -- Andrei