Re: [Numpy-discussion] [Python-ideas] Should our default random number generator be secure?

2015-09-20 Thread Sturla Molden

On 20/09/15 21:48, Sturla Molden wrote:


This is where a small subset of C++ would be handy. Making an uint128_t
class with overloaded operators is a nobrainer. :-)


Meh... The C++ version of PCG already has this.

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] [Python-ideas] Should our default random number generator be secure?

2015-09-20 Thread Sturla Molden

On 19/09/15 18:06, Robert Kern wrote:


That said, we'd
probably end up doing a significant amount of rewriting so that we will
have a C implementation of software-uint128 arithmetic.


This is where a small subset of C++ would be handy. Making an uint128_t 
class with overloaded operators is a nobrainer. :-)


Not that we will ever use C++ in NumPy, but we do have a fair amount of 
C++ in SciPy.



Sturla



___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] [Python-ideas] Should our default random number generator be secure?

2015-09-19 Thread Robert Kern
On Sat, Sep 19, 2015 at 2:26 PM, Antoine Pitrou  wrote:
>
> On Mon, 14 Sep 2015 19:02:58 +0200
> Sturla Molden  wrote:
> > On 14/09/15 10:34, Antoine Pitrou wrote:
> >
> > > If Numpy wanted to switch to a different generator, and if Numba
wanted
> > > to remain compatible with Numpy, one of the PCG functions would be an
> > > excellent choice (also for CPU performance, incidentally).
> >
> > Is Apache license ok in NumPy?
>
> While I don't know Numpy's policy precisely, I have no idea why a
> modern non-copyleft license such as Apache wouldn't be ok.

GPLv2 incompatibility: http://www.apache.org/licenses/GPL-compatibility.html

> That said, PCG looks simple enough that you can reimplement it (or at
> least the variants that are of interest to you) independently without
> too much effort.

The author has agreed in principle to relicense the code as MIT, though has
not yet merged the PR that accomplishes this yet. That said, we'd probably
end up doing a significant amount of rewriting so that we will have a C
implementation of software-uint128 arithmetic.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] [Python-ideas] Should our default random number generator be secure?

2015-09-19 Thread Antoine Pitrou
On Mon, 14 Sep 2015 19:02:58 +0200
Sturla Molden  wrote:
> On 14/09/15 10:34, Antoine Pitrou wrote:
> 
> > If Numpy wanted to switch to a different generator, and if Numba wanted
> > to remain compatible with Numpy, one of the PCG functions would be an
> > excellent choice (also for CPU performance, incidentally).
> 
> Is Apache license ok in NumPy?

While I don't know Numpy's policy precisely, I have no idea why a
modern non-copyleft license such as Apache wouldn't be ok.

That said, PCG looks simple enough that you can reimplement it (or at
least the variants that are of interest to you) independently without
too much effort.

Regards

Antoine.


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] [Python-ideas] Should our default random number generator be secure?

2015-09-18 Thread Nathaniel Smith
On Mon, Sep 14, 2015 at 7:56 AM, Sturla Molden  wrote:
> On 14/09/15 10:34, Antoine Pitrou wrote:
>
>> Currently we don't provide those APIs on the GPU, since MT is much too
>> costly there.
>>
>> If Numpy wanted to switch to a different generator, and if Numba wanted
>> to remain compatible with Numpy, one of the PCG functions would be an
>> excellent choice (also for CPU performance, incidentally).
>
>
> We have discussed allowing plugable PRNGs in NumPy on multiple occations.
>
> I don't think NumPy should change its default PRNG. The Mersenne Twister
> MT19937 is state of the art of numerical work. It has excellent numerical
> accuracy, a huge period, and is very fast. For most users of NumPy, the
> MT19937 is the first and last word that needs to be said about pseudorandom
> number generators.

There are obviously trade-offs here and I haven't looked at the
details enough to have an opinion on whether the benefits are *enough*
that changing the default would be worth it, but it's simply not true
that MT19937 is state of the art. There are faster generators with
better numerical properties and more features (including, but not
only, the PCG family).

It looks like the random123 Philox generator is somewhat popular on
GPUs at this point too (e.g.
https://uk.mathworks.com/help/distcomp/examples/generating-random-numbers-on-a-gpu.html).

-n

-- 
Nathaniel J. Smith -- http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] [Python-ideas] Should our default random number generator be secure?

2015-09-18 Thread Sturla Molden

On 14/09/15 10:26, Robert Kern wrote:


I want fast, multiple independent streams on my
current hardware first, and PCG gives that to me.


DCMT is good for that as well.

It should be possible to implement a pluggable design of NumPy's mtrand. 
Basically call a function pointer instead of rk_double. That way any 
(P)RNG can be plugged in. Then we could e.g. allow a Python callable, a 
Cython class or an f2py function as callback. We could ship multiple 
PRNGs with NumPy, and we could allow users to supply their own.


Sturla

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] [Python-ideas] Should our default random number generator be secure?

2015-09-18 Thread Sturla Molden

On 14/09/15 10:34, Antoine Pitrou wrote:


If Numpy wanted to switch to a different generator, and if Numba wanted
to remain compatible with Numpy, one of the PCG functions would be an
excellent choice (also for CPU performance, incidentally).


Is Apache license ok in NumPy?

(Not sure, thus asking. The PCG suite is Apache licensed.)





___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] [Python-ideas] Should our default random number generator be secure?

2015-09-18 Thread Sturla Molden

On 14/09/15 10:34, Antoine Pitrou wrote:


Currently we don't provide those APIs on the GPU, since MT is much too
costly there.

If Numpy wanted to switch to a different generator, and if Numba wanted
to remain compatible with Numpy, one of the PCG functions would be an
excellent choice (also for CPU performance, incidentally).


We have discussed allowing plugable PRNGs in NumPy on multiple occations.

I don't think NumPy should change its default PRNG. The Mersenne Twister 
MT19937 is state of the art of numerical work. It has excellent 
numerical accuracy, a huge period, and is very fast. For most users of 
NumPy, the MT19937 is the first and last word that needs to be said 
about pseudorandom number generators. We have the default that most 
users of NumPy will ever want, unless severe flaws are discovered in the 
algorithm (not very likely).


But there are occations where it does not work well. For example for 
parallel simulations it is (sometimes) nice to use DCMT instead of a 
vanilla MT19937. There are a lot of other generators that NumPy should 
consider as well, including the one you mention and Marsaglia's 
generators. We need to refactor randomkit to use a plugable entropy 
source instead of its internal MT19937. This is actually very easy to 
implement. In that case a user could just provide a Cython class or 
callback function (C, Fortran or Python) for generating a random ulong32.


The discussion on python-ideas refers to cryptography. That is not 
equally relevant for NumPy. But with a plugable design we can let NumPy 
users use os.urandom as well.



Sturla


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] [Python-ideas] Should our default random number generator be secure?

2015-09-14 Thread Antoine Pitrou
On Mon, 14 Sep 2015 09:26:58 +0100
Robert Kern  wrote:
> 
> Actually, I meant all of the crap *around* it, the platform-compatibility
> testing to see if you have such a hardware instruction or not, and C++
> template shenanigans in the surrounding code. It's possible that the
> complexity is only due to flexibility, but it was too complex for me to
> begin understanding *why* it's so complex before I succumbed to ennui and
> moved on to some other productive use of my free time. At least some of the
> complexity is due to needing software implementations of reduced-round
> crypto for decent performance in the absence of the hardware instruction.
> Performing well in the absence of the hardware instruction is very
> important to me as I do not seem to have the AES-NI instruction available
> on my mid-2012 Macbook Pro. Exposing counter-mode AES128 as a core PRNG is
> a nice idea, but it's just low on my wishlist. I want fast, multiple
> independent streams on my current hardware first, and PCG gives that to me.

Using AES also means emulating it on a GPU will be quite hard.

For the record, Numba is currently using a Mersenne Twister on the CPU,
to emulate Numpy's behaviour (although some of our distributions may be
different):

>>> def f(x):
... np.random.seed(x)
... l = []
... for i in range(10): l.append(np.random.random())
... return l
... 
>>> g = numba.jit(nopython=True)(f)
>>> f(10)
[0.771320643266746, 0.0207519493594015, 0.6336482349262754,
0.7488038825386119, 0.4985070123025904, 0.22479664553084766,
0.19806286475962398, 0.7605307121989587, 0.16911083656253545,
0.08833981417401027]
>>> g(10)
[0.771320643266746, 0.0207519493594015, 0.6336482349262754,
0.7488038825386119, 0.4985070123025904, 0.22479664553084766,
0.19806286475962398, 0.7605307121989587, 0.16911083656253545,
0.08833981417401027]


Currently we don't provide those APIs on the GPU, since MT is much too
costly there.

If Numpy wanted to switch to a different generator, and if Numba wanted
to remain compatible with Numpy, one of the PCG functions would be an
excellent choice (also for CPU performance, incidentally).

Regards

Antoine.


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] [Python-ideas] Should our default random number generator be secure?

2015-09-14 Thread Robert Kern
[Tim, ping me if you want to get dropped from the reply chain, as we are
liable to get more into numpy decision-making. I've dropped python-ideas.]

On Mon, Sep 14, 2015 at 4:34 AM, Tim Peters  wrote:
>
> [Robert Kern ]
> >> ...
> >> I'll also recommend the PCG paper (and algorithm) as the author's
> >> cross-PRNGs comparisons have been bandied about in this thread
already. The
> >> paper lays out a lot of the relevant issues and balances the various
> >> qualities that are important: statistical quality, speed, and security
(of
> >> various flavors).
> >>
> >>   http://www.pcg-random.org/paper.html
> >>
> >> I'm actually not that impressed with Random123. The core idea is nice
and
> >> clean, but the implementation is hideously complex.
>
> [Nathaniel Smith ]
> > I'm curious what you mean by this? In terms of the code, or...?

In terms of the code that would be necessary to implement this in a
simultaneously performant and cross-platform way in numpy.random.

> > I haven't looked at the code,

I recommend it! Approach it with the practical goal of "how do I stick this
in as a new core PRNG for numpy.random?"

> > but the simplest generator they
> > recommend in the paper is literally just
> >
> > def rng_stream(seed):
> > counter = 0
> > while True:
> > # AES128 takes a 128 bit key and 128 bits of data and returns
> > 128 bits of encrypted data
> > val = AES128(key=seed, data=counter)
> > yield low_64_bits(val)
> > yield high_64_bits(val)
> > counter += 1
>
> I assume it's because if you expand what's required to _implement_
> AES128() in C, it does indeed look pretty hideously complex.  On HW
> implementing AES primitives, of course the code can be much simpler.
>
> But to be fair, if integer multiplication and/or addition weren't
> implemented in HW, and we had to write to C code emulating them via
> bit-level fiddling, the code for any of the PCG algorithms would look
> hideously complex too.
>
> But being fair isn't much fun ;-)

Actually, I meant all of the crap *around* it, the platform-compatibility
testing to see if you have such a hardware instruction or not, and C++
template shenanigans in the surrounding code. It's possible that the
complexity is only due to flexibility, but it was too complex for me to
begin understanding *why* it's so complex before I succumbed to ennui and
moved on to some other productive use of my free time. At least some of the
complexity is due to needing software implementations of reduced-round
crypto for decent performance in the absence of the hardware instruction.
Performing well in the absence of the hardware instruction is very
important to me as I do not seem to have the AES-NI instruction available
on my mid-2012 Macbook Pro. Exposing counter-mode AES128 as a core PRNG is
a nice idea, but it's just low on my wishlist. I want fast, multiple
independent streams on my current hardware first, and PCG gives that to me.

On the "if I had to implement all of the bit-fiddling myself" count, I'd
still prefer PCG. It has a similar problem with 128-bit integers that may
or may not be natively provided (should you want a 128-bit state), and I
have to say that the software implementation of that bit-fiddling is much
nicer to work with than reduced-round Threefry. It's reference
implementation is also overly complex for practical use as it is showing
off the whole parameterized PCG family to support the paper, even the
explicitly unrecommended members. But it's relatively straightforward to
disentangle the desirable members; their implementations, including the
supporting 128-bit arithmetic code, are small and clean.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] [Python-ideas] Should our default random number generator be secure?

2015-09-13 Thread Nathaniel Smith
[This is getting fairly off-topic for python-ideas (since AFAICT there
is no particular reason right now to add a new deterministic generator
to the stdlib), so CC'ing numpy-discussion and I'd suggest followups
be directed to there alone.]

On Thu, Sep 10, 2015 at 10:41 AM, Robert Kern  wrote:
> On 2015-09-10 04:56, Nathaniel Smith wrote:
>>
>> On Wed, Sep 9, 2015 at 8:35 PM, Tim Peters  wrote:
>>>
>>> There are some clean and easy approaches to this based on
>>> crypto-inspired schemes, but giving up crypto strength for speed.  If
>>> you haven't read it, this paper is delightful:
>>>
>>>  http://www.thesalmons.org/john/random123/papers/random123sc11.pdf
>>
>>
>> It really is! As AES acceleration instructions become more common
>> (they're now standard IIUC on x86, x86-64, and even recent ARM?), even
>> just using AES in CTR mode becomes pretty compelling -- it's fast,
>> deterministic, provably equidistributed, *and* cryptographically
>> secure enough for many purposes.
>
>
> I'll also recommend the PCG paper (and algorithm) as the author's
> cross-PRNGs comparisons have been bandied about in this thread already. The
> paper lays out a lot of the relevant issues and balances the various
> qualities that are important: statistical quality, speed, and security (of
> various flavors).
>
>   http://www.pcg-random.org/paper.html
>
> I'm actually not that impressed with Random123. The core idea is nice and
> clean, but the implementation is hideously complex.

I'm curious what you mean by this? In terms of the code, or...?

I haven't looked at the code, but the simplest generator they
recommend in the paper is literally just

def rng_stream(seed):
counter = 0
while True:
# AES128 takes a 128 bit key and 128 bits of data and returns
128 bits of encrypted data
val = AES128(key=seed, data=counter)
yield low_64_bits(val)
yield high_64_bits(val)
counter += 1

which gives a 64-bit generator with a period of 2^129. They benchmark
it as faster than the Mersenne Twister on modern CPUs (<2
cycles-per-byte on recent x86, x86-64, ARMv8), it requires less
scratch space, is incredibly simple to work with -- you can
parallelize it, get independent random streams, etc., in a totally
trivial way -- and has a *way* stronger guarantee of
random-looking-ness than merely passing TestU01.

The downsides are that it does still require 176 bytes of read-only
scratch storage (used to cache an expanded version of the "key"), the
need for a modern CPU to get that speed, and that it doesn't play well
with GPUs. So they also provide a set of three more ad hoc generators
designed to solve these problems. I'm not as convinced about these,
but hey, they pass TestU01...

The PCG paper does a much better job of all the other stuff *around*
making a good RNG -- it has the nice website, clear comparisons, nice
code, etc. -- which is definitely important. But to me the increase in
speed and reduction in memory use doesn't seem worth it given how fast
these generators are to start with + the nice properties of counter
mode + and cryptographic guarantees of randomness that you get from
AES, for code that's generally targeting non-embedded non-GPU systems.

-n

-- 
Nathaniel J. Smith -- http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion