Re: legacy code retreat's triva game : the D version

2013-12-22 Thread Marco Leise
Am Sun, 22 Dec 2013 02:12:51 +0100
schrieb Timon Gehr timon.g...@gmx.ch:

 On 12/22/2013 02:09 AM, Timon Gehr wrote:
 
  The morale is that uniform random numbers doesn't imply that
  every value in the range will eventually be generated once!
 
 
  Yes it does. (The probability that some value is never generated is 0.)
  The actual morale is that random number generators do not generate true
  randomness, and poor random number generators may generate sequences
  that do not look remotely random.
 
 'pseudo random number generators' would be a more accurate term.

Can you elaborate a bit? How do you know that the Java LCG
can produce every 32-bit integer once? If that's true then
the problem with the Java code was something different and I
was just biased, because I was already expecting the code to
fail before the fact. (Expectations can do strange things to
your perception.)

-- 
Marco



Re: legacy code retreat's triva game : the D version

2013-12-22 Thread Chris Cain

On Sunday, 22 December 2013 at 08:06:30 UTC, Marco Leise wrote:

Can you elaborate a bit? How do you know that the Java LCG
can produce every 32-bit integer once? If that's true then
the problem with the Java code was something different and I
was just biased, because I was already expecting the code to
fail before the fact. (Expectations can do strange things to
your perception.)


If I may,

http://en.wikipedia.org/wiki/Linear_congruential_generator

Definition of an LCG:
```
Xnext = (a * Xprev + c) % m
```

An LCG is said to have a full period if the length of the 
period is m. If the period is m, we know the LCG must produce 
every number between 0 and m because if there was even one 
repeated number then the generator as defined above would repeat 
the entire sequence up to that point and, thus, the period would 
not be m, which is a contradiction.


According to the Hull-Dobell Theorem, an LCG will have a full 
period iff:

1. `c` and `m` are relatively prime.
For Java, c = 11 and m = 2^48
This condition applies.
2. `(a - 1)` is divisible by all prime factors of m`
For Java, a = 25214903917 and thus a-1 is even which means the 
prime factors of m (just 2) do divide it.

This condition applies.
3. `a - 1` is a multiple of 4 if `m` is a multiple of 4.
For Java, m is a multiple of 4.
`(a - 1)/4` is 6303725979, so it's also a multiple of 4.
This condition applies as well.

Since Java's LCG has a full period over 2^48, we know that taking 
the top 32 bits (which is what Java does to get better 
randomness) would also all be represented.


Re: legacy code retreat's triva game : the D version

2013-12-22 Thread John Colvin

On Sunday, 22 December 2013 at 07:29:22 UTC, ilya-stromberg wrote:

On Saturday, 21 December 2013 at 20:43:27 UTC, bearophile wrote:

3) Just like the integer '5' a range of values as 0 .. 1000 is 
an immutable value. So a variable that scans such range should 
be immutable. If you really want to mutate such variable you 
should add a modifier like mutable or mut or something. 
Another common trap in D coding is iterating on an array of 
structs with foreach, mutating the current struct and 
forgetting that you are mutating only a _copy_ of the items. 
Unfortunately there is no mutable keyword in D, and Walter 
rejected all this idea. So the next best thing it to always 
put immutable at the foreach variable, unless you want to 
mutate it or if you can't use const/immutable for some other 
reason.


Why did Walter reject this idea?
BTW, we don't need `mutable` keyword to implement this idea. We 
should just deny any mutation of item copy. If you really need 
to store temporary result, add new variable. For example:


foreach(i; arr)
{
   ++i; //error - this variable contains copy of data, not a 
ref to the original data


   auto temp_i = i + 1; //OK
}

We already have similar errors, for example:

void foo()
{
   int i;
   i; //Error: var has no effect in expression (i)
}


Those are quite different. The first one does have an effect, 
it's just that the effect is only local to the loop scope. Even 
that isn't guaranteed, as ++i could have side-effects.


Re: legacy code retreat's triva game : the D version

2013-12-22 Thread Timon Gehr

On 12/22/2013 09:06 AM, Marco Leise wrote:

Am Sun, 22 Dec 2013 02:12:51 +0100
schrieb Timon Gehr timon.g...@gmx.ch:


On 12/22/2013 02:09 AM, Timon Gehr wrote:


The morale is that uniform random numbers doesn't imply that
every value in the range will eventually be generated once!



Yes it does. (The probability that some value is never generated is 0.)
The actual morale is that random number generators do not generate true
randomness, and poor random number generators may generate sequences
that do not look remotely random.


'pseudo random number generators' would be a more accurate term.


Can you elaborate a bit?


The probability that a certain number does not occur in one round is 
(n-1)/n.


((n-1)/n)^k goes to 0 rather fast as k goes to infinity.

In fact, the expected number of trials until all numbers are covered is 
~ n log n, and the probability that the process runs significantly 
longer is very small.


See also: http://en.wikipedia.org/wiki/Coupon_collector%27s_problem



How do you know that the Java LCG can produce every 32-bit integer once?


Typically constants are chosen such that this holds, but your code would 
require something stronger to fail, namely, that a certain congruence 
class does not occur. Typically pseudo random number generators are 
chosen such that the generated sequences look close to true randomness. 
If such a simple process can be used to reliably distinguish true 
randomness and the pseudo random number generator, then the pseudo 
random number generator is not very good.



If that's true then
the problem with the Java code was something different and I
was just biased, because I was already expecting the code to
fail before the fact.


Maybe. There is a vast number of ways that this could have failed.


(Expectations can do strange things to your perception.)



Indeed. :)



Re: legacy code retreat's triva game : the D version

2013-12-22 Thread Marco Leise
Am Sun, 22 Dec 2013 09:19:48 +
schrieb Chris Cain clc...@uncg.edu:

 On Sunday, 22 December 2013 at 08:06:30 UTC, Marco Leise wrote:
  Can you elaborate a bit? How do you know that the Java LCG
  can produce every 32-bit integer once? If that's true then
  the problem with the Java code was something different and I
  was just biased, because I was already expecting the code to
  fail before the fact. (Expectations can do strange things to
  your perception.)
 
 If I may,
 
 http://en.wikipedia.org/wiki/Linear_congruential_generator
 
 Definition of an LCG:
 ```
 Xnext = (a * Xprev + c) % m
 ```
 
 An LCG is said to have a full period if the length of the 
 period is m. If the period is m, we know the LCG must produce 
 every number between 0 and m because if there was even one 
 repeated number then the generator as defined above would repeat 
 the entire sequence up to that point and, thus, the period would 
 not be m, which is a contradiction.
 
 According to the Hull-Dobell Theorem, an LCG will have a full 
 period iff:
 1. `c` and `m` are relatively prime.
 For Java, c = 11 and m = 2^48
 This condition applies.
 2. `(a - 1)` is divisible by all prime factors of m`
 For Java, a = 25214903917 and thus a-1 is even which means the 
 prime factors of m (just 2) do divide it.
 This condition applies.
 3. `a - 1` is a multiple of 4 if `m` is a multiple of 4.
 For Java, m is a multiple of 4.
 `(a - 1)/4` is 6303725979, so it's also a multiple of 4.
 This condition applies as well.
 
 Since Java's LCG has a full period over 2^48, we know that taking 
 the top 32 bits (which is what Java does to get better 
 randomness) would also all be represented.


Am Sun, 22 Dec 2013 13:09:51 +0100
schrieb Timon Gehr timon.g...@gmx.ch:

 On 12/22/2013 09:06 AM, Marco Leise wrote:
  Am Sun, 22 Dec 2013 02:12:51 +0100
  schrieb Timon Gehr timon.g...@gmx.ch:
 
  On 12/22/2013 02:09 AM, Timon Gehr wrote:
 
  The morale is that uniform random numbers doesn't imply that
  every value in the range will eventually be generated once!
 
 
  Yes it does. (The probability that some value is never generated is 0.)
  The actual morale is that random number generators do not generate true
  randomness, and poor random number generators may generate sequences
  that do not look remotely random.
 
  'pseudo random number generators' would be a more accurate term.
 
  Can you elaborate a bit?
 
 The probability that a certain number does not occur in one round is 
 (n-1)/n.
 
 ((n-1)/n)^k goes to 0 rather fast as k goes to infinity.
 
 In fact, the expected number of trials until all numbers are covered is 
 ~ n log n, and the probability that the process runs significantly 
 longer is very small.
 
 See also: http://en.wikipedia.org/wiki/Coupon_collector%27s_problem
 
 
  How do you know that the Java LCG can produce every 32-bit integer once?
 
 Typically constants are chosen such that this holds, but your code would 
 require something stronger to fail, namely, that a certain congruence 
 class does not occur. Typically pseudo random number generators are 
 chosen such that the generated sequences look close to true randomness. 
 If such a simple process can be used to reliably distinguish true 
 randomness and the pseudo random number generator, then the pseudo 
 random number generator is not very good.

Thank you two for explaining LCGs to me. That's good
information for reasoning about code. Every good (full period)
LCG is a specific permutation of the numbers [0..m). The next
time I wonder how I can iterate in random order over a list of
length n^2, I know what I'll use ;)

-- 
Marco



Re: legacy code retreat's triva game : the D version

2013-12-21 Thread Ivan Kazmenko

On Saturday, 21 December 2013 at 05:12:57 UTC, Chris Cain wrote:
For more information, I've written a document on an 
implementation of uniform (which should be coming in 2.065, 
btw) which discusses the issue with just using the modulus 
operator:

https://dl.dropboxusercontent.com/u/2206555/uniformUpgrade.pdf


Looks like your new implementation has one modulo operator, 
compared to the previous one having two divisions.  That may be 
the cause of speedup.


The previous implementation was, by its looks, copied from C++ 
Boost which also uses two divisions.  Do you know the reason for 
that?  They seem to have been solving the exact same problem 
(strict uniformness provided that the underlying RNG is uniform).


I'd like to touch a relevant point here that matters for me.  In 
a mature randomness library, one important quality is 
reproducibility: there are applications where you want to use 
pseudo-random values, but generate the exact same pseudo-random 
values across different versions, computers, operating systems 
and language implementations.  So far I have seen very few 
languages which provide such reproducibility guarantees for their 
standard library.  For example, in C and C++ standard randomness 
library, the details were implementation-dependent all the way 
until the recent C++11.  Python stood for long but finally broke 
it between 3.1 and 3.2 because of the exact same non-uniformness 
problem.  A positive example in this regard is Java which 
enforces the implementation of Random since at least version 1.5.


If you break the reproducibility of uniform in dmd 2.065, there 
should be at least a note on that in its documentation.  For a 
mature library, I think the old implementation should also have 
been made available somehow. (well, there's always an option to 
include an old library version in your project, but...)  Perhaps 
that's not the case for D and Phobos since they are still not 
stabilized.  Especially so for std.random which is due to more 
breakage anyway because of the value/reference issues with RNG 
types.


Regarding that, I have a point on designing a randomness library. 
 Right now, most of what I have seen has at most two layers: the 
core RNG providing random bits, and the various uses of these 
bits, like uniform distribution on a segment, random shuffle and 
so on.  It is comfortable when the elements of the two layers are 
independent, and you can compose different first layers (LCG, 
MT19937, or maybe some interface to /dev/*random) with different 
second layer functions (uniform[0,9], random_shuffle, etc.).  
Still, many of the useful second level functions build upon 
uniform distribution for integers on a segment.  Thus I would 
like to have an explicit intermediate layer consisting of uniform 
and maybe other distributions which could also have different 
(fast vs. exact) implementations to choose from.  In the long 
run, such design could also solve reproducibility problems: we 
can provide another implementation of uniform as the default, but 
it is still easy to set the previous one as the preferred 
intermediate level.


Ivan Kazmenko.


Re: legacy code retreat's triva game : the D version

2013-12-21 Thread bearophile

Chris Cain:


https://dl.dropboxusercontent.com/u/2206555/uniformUpgrade.pdf


From page 6:

size_t[] counts = new size_t[](top);
foreach(i; 0 .. 500_000_000)
counts[uniform(0, top)] += 1;


Modern D allows you to write better code:

size_t[N] counts;
foreach (immutable _; 0 .. 500_000_000)
counts[uniform(0, $)]++;

Bye,
bearophile


Re: legacy code retreat's triva game : the D version

2013-12-21 Thread Meta

On Saturday, 21 December 2013 at 15:03:34 UTC, bearophile wrote:

Chris Cain:


https://dl.dropboxusercontent.com/u/2206555/uniformUpgrade.pdf


From page 6:

size_t[] counts = new size_t[](top);
foreach(i; 0 .. 500_000_000)
counts[uniform(0, top)] += 1;


Modern D allows you to write better code:

size_t[N] counts;
foreach (immutable _; 0 .. 500_000_000)
counts[uniform(0, $)]++;

Bye,
bearophile


I know immutable is a good thing, but don't you think `immutable 
_` is a bit unnecessary in this case?


Re: legacy code retreat's triva game : the D version

2013-12-21 Thread bearophile

Meta:

I know immutable is a good thing, but don't you think 
`immutable _` is a bit unnecessary in this case?


Some answer, choose the one you prefer:

1) Yes, it's totally useless because the _ variable is not even 
used inside the loop body! So sorry, I'm always so pedantic.


2) It's necessary, don't you see that? You don't need to mutate 
that variable, so it's better for it be immutable. A simple rule 
to follow is to make const/immutable all variables that don't 
need to mutate, to make code simpler and safer. There's no real 
reason to break that general rule in this case.


3) Just like the integer '5' a range of values as 0 .. 1000 is an 
immutable value. So a variable that scans such range should be 
immutable. If you really want to mutate such variable you should 
add a modifier like mutable or mut or something. Another 
common trap in D coding is iterating on an array of structs with 
foreach, mutating the current struct and forgetting that you are 
mutating only a _copy_ of the items. Unfortunately there is no 
mutable keyword in D, and Walter rejected all this idea. So the 
next best thing it to always put immutable at the foreach 
variable, unless you want to mutate it or if you can't use 
const/immutable for some other reason.


Probably I can invent you more creative answers if you want.

Bear hugs,
bearophile


Re: legacy code retreat's triva game : the D version

2013-12-21 Thread Marco Leise
Am Fri, 20 Dec 2013 15:53:08 +0100
schrieb marcpmichel marc.p.mic...@gmail.com:

 
 I participated in the global day of code retreat 2013, and we 
 had to do refactoring on a very ugly piece of code which was 
 available on many languages.
 But there was no D version, so I made one (based on the java 
 version) and pull-requested it.
 
 Here is the ugly thing :
 https://github.com/jbrains/trivia/tree/master/d
 
 EOT

bool notAWinner;
do {
game.roll(rand.front() % 5 + 1);
rand.popFront();

if (rand.front() % 9 == 7) {// -- WARNING! WARNING!
notAWinner = game.wrongAnswer();
} else {
notAWinner = game.wasCorrectlyAnswered();
}
rand.popFront();
} while (notAWinner);

This kind of code is a dangerous gamble.

This is a story about my student time: I once sat in a Java
class and one of the students had an issue with their code not
outputting anything and not quitting either. When the teacher
came around, we found only one obvious point for an infinite
loop could occur and it looked like this:

  Random rng = new Random();
  int count = 0;

  // Visit all items once
  while (count  list.size()) {
bool found = false;
while (!found) {
  int idx = rng.nextInt() % list.size();
  if (list[idx].visited == false) {
list[idx].visited = true;
found = true;
count++;
  }
}
  }

[I don't remember the exact lines, but this is the gist of it.]
The teacher himself wrote this code and presented it to the
class as a simple way to iterate over a list in random order
which was part of todays programming task.
It didn't cause issues for any of the other students, but on
this particular computer the random seed that the Random ctor
chose caused a degenerate case where it never hit any of the 3
remaining indexes of the list.

The morale is that uniform random numbers doesn't imply that
every value in the range will eventually be generated once!

-- 
Marco



Re: legacy code retreat's triva game : the D version

2013-12-21 Thread Timon Gehr

On 12/22/2013 01:07 AM, Marco Leise wrote:

...
It didn't cause issues for any of the other students, but on
this particular computer the random seed that the Random ctor
chose caused a degenerate case where it never hit any of the 3
remaining indexes of the list.

The morale is that uniform random numbers doesn't imply that
every value in the range will eventually be generated once!



Yes it does. (The probability that some value is never generated is 0.) 
The actual morale is that random number generators do not generate true 
randomness, and poor random number generators may generate sequences 
that do not look remotely random.


Re: legacy code retreat's triva game : the D version

2013-12-21 Thread Timon Gehr

On 12/22/2013 02:09 AM, Timon Gehr wrote:


The morale is that uniform random numbers doesn't imply that
every value in the range will eventually be generated once!



Yes it does. (The probability that some value is never generated is 0.)
The actual morale is that random number generators do not generate true
randomness, and poor random number generators may generate sequences
that do not look remotely random.


'pseudo random number generators' would be a more accurate term.


Re: legacy code retreat's triva game : the D version

2013-12-21 Thread ilya-stromberg

On Saturday, 21 December 2013 at 20:43:27 UTC, bearophile wrote:

3) Just like the integer '5' a range of values as 0 .. 1000 is 
an immutable value. So a variable that scans such range should 
be immutable. If you really want to mutate such variable you 
should add a modifier like mutable or mut or something. 
Another common trap in D coding is iterating on an array of 
structs with foreach, mutating the current struct and 
forgetting that you are mutating only a _copy_ of the items. 
Unfortunately there is no mutable keyword in D, and Walter 
rejected all this idea. So the next best thing it to always put 
immutable at the foreach variable, unless you want to mutate 
it or if you can't use const/immutable for some other reason.


Why did Walter reject this idea?
BTW, we don't need `mutable` keyword to implement this idea. We 
should just deny any mutation of item copy. If you really need to 
store temporary result, add new variable. For example:


foreach(i; arr)
{
   ++i; //error - this variable contains copy of data, not a ref 
to the original data


   auto temp_i = i + 1; //OK
}

We already have similar errors, for example:

void foo()
{
   int i;
   i; //Error: var has no effect in expression (i)
}


Re: legacy code retreat's triva game : the D version

2013-12-20 Thread bearophile

marcpmichel:


Here is the ugly thing :
https://github.com/jbrains/trivia/tree/master/d


And wrong:


if (rand.front() % 9 == 7) {


Bye,
bearophile


Re: legacy code retreat's triva game : the D version

2013-12-20 Thread marcpmichel

On Friday, 20 December 2013 at 15:05:07 UTC, bearophile wrote:

marcpmichel:


Here is the ugly thing :
https://github.com/jbrains/trivia/tree/master/d


And wrong:


if (rand.front() % 9 == 7) {


Bye,
bearophile


Do you mean I should have used :
if (uniform(0,10) == 7) {
instead ?


Re: legacy code retreat's triva game : the D version

2013-12-20 Thread bearophile

marcpmichel:


Do you mean I should have used :
if (uniform(0,10) == 7) {
instead ?


Right. Using % introduces a bias.

Bye,
bearophile


Re: legacy code retreat's triva game : the D version

2013-12-20 Thread Chris Cain

On Friday, 20 December 2013 at 16:20:44 UTC, marcpmichel wrote:

On Friday, 20 December 2013 at 15:05:07 UTC, bearophile wrote:

marcpmichel:


Here is the ugly thing :
https://github.com/jbrains/trivia/tree/master/d


And wrong:


if (rand.front() % 9 == 7) {


Bye,
bearophile


Do you mean I should have used :
if (uniform(0,10) == 7) {
instead ?


TL;DR version:
Actually, the equivalent would be uniform(0, 9), but yes, that'd 
be the preferable approach there. (also note 
https://github.com/jbrains/trivia/blob/7b473f9fbbd125b0ab1c2e82582b8a8c414ca501/d/source/trivia.d#L19 
too should be changed to `uniform(1, 6)` which will give numbers 
in the range [1 .. 6) ... that's what you want, right?)


Long version:
For more information, I've written a document on an 
implementation of uniform (which should be coming in 2.065, btw) 
which discusses the issue with just using the modulus operator:

https://dl.dropboxusercontent.com/u/2206555/uniformUpgrade.pdf

Generally speaking, this new uniform will be _extremely_ close to 
the same speed of just using the modulus operator, but avoids the 
bias issue. I think there is no real good reason to not use the 
standard function anymore.


That said, the bias with such a small number (9) won't be 
significant. If rand gives you a uniform 32-bit number, then the 
distribution of rand % 9 will be [477218589, 477218589, 
477218589, 477218589, 477218588, 477218588, 477218588, 477218588, 
477218588] (notice how the first 4 have 1 more occurrence than 
the rest?)... so the bias is miniscule in this case.


The bias issue matters a lot more with larger numbers where some 
numbers could actually occur twice as often as others, or if your 
application demands high quality random numbers (think gambling 
games). Related to those reasons, even if your code doesn't use 
large numbers and isn't used for a gambling game now, it's still 
possible for it to eventually be used for such things (or to 
influence others to follow your example for the bad situations). 
For those reasons alone, it's pretty important to get in the 
habit of using the standard function.


But that's not all since the standard function is probably easier 
to read, too. Let's say you want to emulate a standard 6-sided 
die. If you want numbers in the range [1..6] (note inclusive 
bounds) it's easier to see immediately when you say 
`uniform![](1, 6)' rather than `rand % 6 + 1`


That's probably all TMI, but maybe all of that will be useful for 
you.