Re: [Numpy-discussion] match RNG numbers with R?

2014-04-07 Thread Sturla Molden
Yaroslav Halchenko  wrote:

> it is in src/main/RNG.c (ack is nice ;) )... from visual inspection looks 
> "matching"

I see... It's a rather vanilla Mersenne Twister, and it just use 32 bits of
randomness. An signed int32 is multiplied by 2.3283064365386963e-10 to
scale it to [0,1). Then they also have a function called "fixup" that
prevents 0 or 1 from being returned. If the generated random value x <= 0.0
they return 0.5*2.328306437080797e-10, and if (1.0-x)<=0.0 they return
1.0-0.5*2.328306437080797e-10. 

http://svn.r-project.org/R/branches/R-3-1-branch/src/main/RNG.c

Also note that the algorithm used to set the seed is important. If you set
a seed equal 1 in NumPy, that might not mean the same in R. And then of
course there is the algorithm used to sample integers. 

Sturla

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] match RNG numbers with R?

2014-04-07 Thread Robert Kern
On Mon, Apr 7, 2014 at 3:16 PM, Daπid  wrote:
>
> On Apr 7, 2014 3:59 AM, "Yaroslav Halchenko"  wrote:
>> so I would assume that the devil is indeed in R post-processing and would
>> look
>> into it (if/when get a chance).
>
> The devil here is the pigeon and the holes problem. Mersenne Twister
> generates random integers in a certain range. The output is guaranteed to be
> deterministic, uniform, and reproducible.
>
> But when you want to cast those m possible input in n possible outputs, you
> need to do magic (or maths) to keep the new distribution truly uniform.
> Simply getting random bytes and viewing them as ints will give you low
> quality random numbers.
>
> The algorithm that casts MT output to a random integer is probably what is
> different, and perhaps you could find it documented somewhere.

This is ours:

https://github.com/numpy/numpy/blob/master/numpy/random/mtrand/randomkit.c#L259

-- 
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] match RNG numbers with R?

2014-04-07 Thread Daπid
On Apr 7, 2014 3:59 AM, "Yaroslav Halchenko"  wrote:
> so I would assume that the devil is indeed in R post-processing and would
look
> into it (if/when get a chance).

The devil here is the pigeon and the holes problem. Mersenne Twister
generates random integers in a certain range. The output is guaranteed to
be deterministic, uniform, and reproducible.

But when you want to cast those m possible input in n possible outputs, you
need to do magic (or maths) to keep the new distribution truly uniform.
Simply getting random bytes and viewing them as ints will give you low
quality random numbers.

The algorithm that casts MT output to a random integer is probably what is
different, and perhaps you could find it documented somewhere.

As a side note, C++11 includes in the standard a MT RNG guaranteed to be
the same across implementations, but there is no promise about the
distributions -not even across versions of the same implementation.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] match RNG numbers with R?

2014-04-07 Thread Yaroslav Halchenko

On Mon, 07 Apr 2014, Sturla Molden wrote:

> > so I would assume that the devil is indeed in R post-processing and would 
> > look
> > into it (if/when get a chance).

> I tried to look into the R source code. It's the worst mess I have ever
> seen. I couldn't even find their Mersenne twister.

it is in src/main/RNG.c (ack is nice ;) )... from visual inspection looks 
"matching"

-- 
Yaroslav O. Halchenko, Ph.D.
http://neuro.debian.net http://www.pymvpa.org http://www.fail2ban.org
Senior Research Associate, Psychological and Brain Sciences Dept.
Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755
Phone: +1 (603) 646-9834   Fax: +1 (603) 646-1419
WWW:   http://www.linkedin.com/in/yarik
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] match RNG numbers with R?

2014-04-07 Thread Sturla Molden
Yaroslav Halchenko  wrote:

> so I would assume that the devil is indeed in R post-processing and would look
> into it (if/when get a chance).

I tried to look into the R source code. It's the worst mess I have ever
seen. I couldn't even find their Mersenne twister.

Sturla

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] match RNG numbers with R?

2014-04-06 Thread Yaroslav Halchenko

On Sun, 06 Apr 2014, Sturla Molden wrote:
> > R, Python std library, numpy all have Mersenne Twister RNG implementation.  
> > But
> > all of them generate different numbers.  This issue was previously 
> > discussed in
> > https://github.com/numpy/numpy/issues/4530 :  In Python, and numpy generated
> > numbers are based on using 53 bits of two 32 bit random integers generated 
> > by
> > the algorithm (see below).Upon my brief inspection, original 32bit 
> > numbers
> > are nohow available for access neither in NumPy nor in Python stdlib
> > implementation.

> NumPy uses the Randomkit library. The source is here:

> https://github.com/numpy/numpy/tree/master/numpy/random/mtrand

> It very easy to modify to produce whatever result you want. You can build
> it separately from NumPy.

> RandomState.bytes and np.random.bytes gives you the original random bytes
> in little endian order. It basically calles rk_fill in Randomkit and then
> unpacks the 32 bit random integer into four bytes. Just view it as
> dtype=' a = np.random.bytes(4*n).view(dtype='> np.random.seed(1); [int(np.asscalar(np.fromstring(np.random.bytes(4), 
>> dtype='http://neuro.debian.net http://www.pymvpa.org http://www.fail2ban.org
Senior Research Associate, Psychological and Brain Sciences Dept.
Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755
Phone: +1 (603) 646-9834   Fax: +1 (603) 646-1419
WWW:   http://www.linkedin.com/in/yarik
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] match RNG numbers with R?

2014-04-06 Thread Sturla Molden
> a = np.random.bytes(4*n).view(dtype='> 5).astype(np.int32)
b = (np.random.bytes(4*n).view(dtype='> 6).astype(np.int32)
r = (a * 67108864.0 + b) / 9007199254740992.0

Sturla

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] match RNG numbers with R?

2014-04-06 Thread Sturla Molden
Yaroslav Halchenko  wrote:

> R, Python std library, numpy all have Mersenne Twister RNG implementation.  
> But
> all of them generate different numbers.  This issue was previously discussed 
> in
> https://github.com/numpy/numpy/issues/4530 :  In Python, and numpy generated
> numbers are based on using 53 bits of two 32 bit random integers generated by
> the algorithm (see below).Upon my brief inspection, original 32bit numbers
> are nohow available for access neither in NumPy nor in Python stdlib
> implementation.

NumPy uses the Randomkit library. The source is here:

https://github.com/numpy/numpy/tree/master/numpy/random/mtrand

It very easy to modify to produce whatever result you want. You can build
it separately from NumPy.

RandomState.bytes and np.random.bytes gives you the original random bytes
in little endian order. It basically calles rk_fill in Randomkit and then
unpacks the 32 bit random integer into four bytes. Just view it as
dtype='http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] match RNG numbers with R?

2014-04-06 Thread Yaroslav Halchenko
Hi NumPy gurus,

We wanted to test some of our code by comparing to results of R
implementation which provides bootstrapped results.

R, Python std library, numpy all have Mersenne Twister RNG implementation.  But
all of them generate different numbers.  This issue was previously discussed in
https://github.com/numpy/numpy/issues/4530 :  In Python, and numpy generated
numbers are based on using 53 bits of two 32 bit random integers generated by
the algorithm (see below).Upon my brief inspection, original 32bit numbers
are nohow available for access neither in NumPy nor in Python stdlib
implementation.

I wonder if I have missed something and there is an easy way (i.e. without
reimplementing core algorithm, or RPy'ing numbers from R) to generate random
numbers in Python to match the ones in R?

Excerpt from
http://nbviewer.ipython.org/url/www.onerussian.com/tmp/random_randomness.ipynb

# R
%R RNGkind("Mersenne-Twister"); set.seed(1); sample(0:9, 10, replace=T)

array([2, 3, 5, 9, 2, 8, 9, 6, 6, 0], dtype=int32)

# stock Python
random.seed(1); [random.randint(0, 10) for i in range(10)]

[1, 9, 8, 2, 5, 4, 7, 8, 1, 0]

# numpy
rng = nprandom.RandomState(1);  [rng.randint(0, 10) for i in range(10)]

[5, 8, 9, 5, 0, 0, 1, 7, 6, 9]



-- 
Yaroslav O. Halchenko, Ph.D.
http://neuro.debian.net http://www.pymvpa.org http://www.fail2ban.org
Senior Research Associate, Psychological and Brain Sciences Dept.
Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755
Phone: +1 (603) 646-9834   Fax: +1 (603) 646-1419
WWW:   http://www.linkedin.com/in/yarik
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion