Re: [Haskell-cafe] why is Random in System?

2011-08-19 Thread Ryan Newton
 Yep, but don't conflate determinism with splitting. In the imperative
 world, you normally know how many CPUs you have, so you initialize one PRNG
 per CPU, and simply go from there; there's no need for splitting. In the
 parallel community, people are going out of their way to *avoid*
  splitting.


I'm having trouble thinking of scenarios where per-CPU does the trick.  Do
you mean one per pthread rather than one per CPU?

In the Cilk case, you've got to deal with work stealing of course.  So you
want rand() to generate a result that is determined by the current
stack-frame's position in the binary-tree of spawns.  In the work I was
referring to:


http://groups.csail.mit.edu/sct/wiki/index.php?title=Other_Projects#Deterministic_Parallel_Random-Number_Generation

... they try a bunch of different methods and I can't remember if any of
them split the RNG eagerly as they go down the spawn tree or if they just
record the tree-index on the way down and then read it out when they
generated randoms.  (I think the latter.)

Cheers,
  -Ryan
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] why is Random in System?

2011-08-17 Thread Ertugrul Soeylemez
Brandon Allbery allber...@gmail.com wrote:

   I've noticed there's a convention to put modules having to deal
   with randomness into System.Random.  I thought System was for OS
   interaction?  Granted getting a random seed usually means going to
   the OS, but isn't the rest of it, like generating random
   sequences, distributions, selecting based on probability,
   shuffling, etc. all non-OS related algorithms?
 
  System definitely does seem like an odd choice.  In most cases the
  only interaction any PRNG, even when accessed via the FFI, has with
  the system is - as you say - to get an initial seed value for a
  global instance.

 I'd be tempted to guess that the whole reason it's under System is the
 IO component.

That's not really valid, is it?  After all the new 'time' package is
also stationed under the Data tree, and it has a similarly large IO
component.  I have to say, it seems very intuitive to me to look for it
under Data, even though I'm not sure why.  Probably I'm just used to it.
Time has a strong connection to the operating system and the hardware,
so it could just as well go into the System tree.  For
(non-cryptographic) randomness however we are dealing with numerical
data, for which the connection to the system is mere convenience, so I
wouldn't mind finding it under Data at all.


Greets,
Ertugrul


-- 
nightmare = unsafePerformIO (getWrongWife = sex)
http://ertes.de/



___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] why is Random in System?

2011-08-17 Thread Ryan Newton
Hi all,

I'm the maintainer of random.  If people could decide on what the
alternative name would be we could put it through the library proposal
process.  It seems that one problem at this moment is the lack of a single,
clear right answer.  Replacing one debatable not-quite-right choice with
another may not be satisfying ;-).

Also, what Thomas says is right.  The current implementation is SLOW and
WEAK, which would not seem to make a good default implementation.  The goal
is to replace it with something better so that the default random package is
strong in at least one dimension.  I think this is important because I
imagine many people use the default package, for example because they don't
want to scour hackage and try all the alternatives.

My proposal for this has been to use AES based crypto-prng.  I think that is
fast enough (i.e. faster than what's currently there), very strong, and
splittable.  New Intel and AMD hardware has hardware support for AES which
makes it even faster.  The intel-aes package provides this functionality,
with and without hardware support.  But there's work left to do in terms of
testing, making sure its cross platform, etc.  Anyone who's interested in
helping (especially with Windows support) would be warmly welcomed!

Cheers,
  -Ryan



On Wed, Aug 17, 2011 at 4:46 AM, Ertugrul Soeylemez e...@ertes.de wrote:

 Brandon Allbery allber...@gmail.com wrote:

I've noticed there's a convention to put modules having to deal
with randomness into System.Random.  I thought System was for OS
interaction?  Granted getting a random seed usually means going to
the OS, but isn't the rest of it, like generating random
sequences, distributions, selecting based on probability,
shuffling, etc. all non-OS related algorithms?
  
   System definitely does seem like an odd choice.  In most cases the
   only interaction any PRNG, even when accessed via the FFI, has with
   the system is - as you say - to get an initial seed value for a
   global instance.
 
  I'd be tempted to guess that the whole reason it's under System is the
  IO component.

 That's not really valid, is it?  After all the new 'time' package is
 also stationed under the Data tree, and it has a similarly large IO
 component.  I have to say, it seems very intuitive to me to look for it
 under Data, even though I'm not sure why.  Probably I'm just used to it.
 Time has a strong connection to the operating system and the hardware,
 so it could just as well go into the System tree.  For
 (non-cryptographic) randomness however we are dealing with numerical
 data, for which the connection to the system is mere convenience, so I
 wouldn't mind finding it under Data at all.


 Greets,
 Ertugrul


 --
 nightmare = unsafePerformIO (getWrongWife = sex)
 http://ertes.de/



 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] why is Random in System?

2011-08-17 Thread Ertugrul Soeylemez
Ryan Newton rrnew...@gmail.com wrote:

 I'm the maintainer of random.  If people could decide on what the
 alternative name would be we could put it through the library proposal
 process.  It seems that one problem at this moment is the lack of a
 single, clear right answer.  Replacing one debatable not-quite-right
 choice with another may not be satisfying ;-).

 Also, what Thomas says is right.  The current implementation is SLOW
 and WEAK, which would not seem to make a good default implementation.
 The goal is to replace it with something better so that the default
 random package is strong in at least one dimension.  I think this is
 important because I imagine many people use the default package, for
 example because they don't want to scour hackage and try all the
 alternatives.

 My proposal for this has been to use AES based crypto-prng.  I think
 that is fast enough (i.e. faster than what's currently there), very
 strong, and splittable.  New Intel and AMD hardware has hardware
 support for AES which makes it even faster.  The intel-aes package
 provides this functionality, with and without hardware support.  But
 there's work left to do in terms of testing, making sure its cross
 platform, etc.  Anyone who's interested in helping (especially with
 Windows support) would be warmly welcomed!

Using a cryptographically strong random number generator here is
probably a very bad idea.  Two reasons:

Firstly while being faster than the current implementation an AES-based
implementation will still be considerably slower than the Mersenne
Twister algorithm.  This may or may not be true, if hardware AES support
is there, but don't just assume that everybody has AES instructions now.
For example I don't have them.

Secondly there is no standard requiring that the default random number
generator is cryptographically safe.  Changing this particular
implementation, which is the one most people use, to a CSPRNG will make
people take for granted that System.Random is safe to use in
security-related products, because it would be very convenient.  This
will render strong security products trivially weak, when compiled with
the wrong Haskell distribution, and you will find packages with
statements like:  We assume that you use Ryan Newton's distribution of
the random package.

I would rather propose the Mersenne Twister as the default random number
generator.  You could add AES as a secondary generator for people
requiring cryptographic strength, but then do it properly, i.e. impure,
because most people, when reading about a PRNG with AES anywhere in
its name, will just assume that it's a CSPRNG.


Greets,
Ertugrul


-- 
nightmare = unsafePerformIO (getWrongWife = sex)
http://ertes.de/



___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] why is Random in System?

2011-08-17 Thread Bryan O'Sullivan
On Wed, Aug 17, 2011 at 8:56 AM, Ryan Newton rrnew...@gmail.com wrote:


 I'm the maintainer of random.  If people could decide on what the
 alternative name would be we could put it through the library proposal
 process.  It seems that one problem at this moment is the lack of a single,
 clear right answer.  Replacing one debatable not-quite-right choice with
 another may not be satisfying ;-).


The entire premise of that discussion seems quite ridiculous. All that
renaming the modules will achieve is the breakage of several hundred
packages. The roots of the module naming hierarchy - Control, Data, System,
and so on - are so muddled and inconsistently used that the best advice you
could give to people who raise this topic is to pretend those roots are
simply not there.

My proposal for this has been to use AES based crypto-prng.


We'd be better off if you could seek consensus from PRNG maintainers on a
fixed-up Random class before attacking this problem, so that we'd have a
better chance of achieving cross-package compatibility.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] why is Random in System?

2011-08-17 Thread Ryan Newton
The problem with Mersenne twister is that it doesn't split well.  The main
reason for crypto prng in this package would not be to advertise to people
that System.Random can be used for security-related apps *but to make
splitting reasonably safe*.  It's not good enough to have a known-bad
generator under splitting provided as the default.  And I think we need
splitting, especially as more Haskell programs become parallel.  Would it
address your concerns to not mention the crypto nature of the standard
implementation in the System.Random documentation?

I think there's also a reasonable argument to lean towards *correctness* over
performance in Haskell's defaults.  For example, unconstrained Num bindings
default to Integer and likewise random numbers could be as strong as
possible by default, and those looking for raw rands/sec throughput could
make other informed choices.

I had thought that maybe we could bifurcate the stdgen concept into a fast
and a strong version, which could be say Mersenne Twister and AES
respectively.  But, again, the problem comes if the fast version is expected
to be splittable as well.

With SplittableGen factored out from RandomGen I suppose it would be
possible for the fast version to NOT offer splitting.  However, Mersenne
Twister is best used with an imperative interface, you can see the tension
in the pure version of the mersenne package on hackage:

  http://hackage.haskell.org/package/mersenne-random-pure64-0.2.0.3

Please also see Burton Smith's comments below in response to my proposal to
offer a MT + AES combination.

Best,
  -Ryan

-- Forwarded message --
From: Burton Smith burt...@microsoft.com
Date: Tue, Jun 28, 2011 at 1:28 PM
Subject: RE: AESNI-based splittable random number generation for Haskell
To: Newton, Ryan R ryan.r.new...@intel.com

Mersenne Twister (MT)is a poor choice in my opinion.  First, the generator
state is large (2496 bytes) and it must be copied on each call to next.
 Split is worse; it will generate twice as many bytes per call as next will.

Second, I see no good way to guarantee independence of the two generators
emanating from a split.  MT is hard to initialize anyway, and giant-stepping
it to define the newly split generator (as we did back in the 80's paper) is
not only hard for an LFSR like MT but, worse yet, it doesn't work for
Haskell or other fine-grain concurrent languages because split and next will
commute.  Other tree RNGs, e.g. SPRNG, have the same commutativity issue.
 Block ciphers address this issue head-on by reducing the split independence
problem to a crypto problem.

A better choice might be some block cipher other than AES.  Two
possibilities are XTEA and RC4.  Both are in Wikipedia.  RC4 has 256 bytes
of key state, still bigger than I would like.

Another scheme is to make the number of rounds an option.  With AESNI, this
could scream.

Burton


On Wed, Aug 17, 2011 at 12:26 PM, Ertugrul Soeylemez e...@ertes.de wrote:

 Ryan Newton rrnew...@gmail.com wrote:



 Using a cryptographically strong random number generator here is
 probably a very bad idea.  Two reasons:

 Firstly while being faster than the current implementation an AES-based
 implementation will still be considerably slower than the Mersenne
 Twister algorithm.  This may or may not be true, if hardware AES support
 is there, but don't just assume that everybody has AES instructions now.
 For example I don't have them.

 Secondly there is no standard requiring that the default random number
 generator is cryptographically safe.  Changing this particular
 implementation, which is the one most people use, to a CSPRNG will make
 people take for granted that System.Random is safe to use in
 security-related products, because it would be very convenient.  This
 will render strong security products trivially weak, when compiled with
 the wrong Haskell distribution, and you will find packages with
 statements like:  We assume that you use Ryan Newton's distribution of
 the random package.

 I would rather propose the Mersenne Twister as the default random number
 generator.  You could add AES as a secondary generator for people
 requiring cryptographic strength, but then do it properly, i.e. impure,
 because most people, when reading about a PRNG with AES anywhere in
 its name, will just assume that it's a CSPRNG.


 Greets,
 Ertugrul


 --
 nightmare = unsafePerformIO (getWrongWife = sex)
 http://ertes.de/



 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] why is Random in System?

2011-08-17 Thread Bryan O'Sullivan
On Wed, Aug 17, 2011 at 11:10 AM, Ryan Newton rrnew...@gmail.com wrote:

 The problem with Mersenne twister is that it doesn't split well.  The main
 reason for crypto prng in this package would not be to advertise to people
 that System.Random can be used for security-related apps *but to make
 splitting reasonably safe*.


The more fundamental problem is that splitting is neither well understood
nor generally safe, and as such it should not be in the basic Random class.
A more sensible API would have a Random class that lacks a split operation,
and a SplittableRandom class that permits it, as you mention later in your
message. Most current PRNGs could then be instances of Random, but not
SplittableRandom.

And I think we need splitting, especially as more Haskell programs become
 parallel.


I do not agree here, I'm afraid.

By the way, my mwc-random package is at least as fast as mersenne-twister,
has smaller state, and is pure Haskell.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] why is Random in System?

2011-08-17 Thread Ryan Newton

 The more fundamental problem is that splitting is neither well understood
 nor generally safe, and as such it should not be in the basic Random class.


Would you mind elaborating?  Splitting was not well-understood by the
original authors of System.Random; that much is in the comments.  Nor is it
well understood by me.  But I am under the impression that it is well
understood by Burton Smith and others who have worked on the topic, and that
they assure us that using AES, RNG's under any series of splits are as
strong as those generated in a linear sequence.  (And if you show otherwise,
you have a crypto paper and quite a name for yourself.)


 And I think we need splitting, especially as more Haskell programs become
 parallel.


 I do not agree here, I'm afraid.


Could you expound on this also?  The people I know in the parallelism
community seem to care a lot about deterministic PRNG in parallel programs.
 For example, the Cilk folks at MIT and Intel who I work with are *modifying
their runtime system *just to get deterministic parallel PRNG.

For example our in our Monad Par package splittable RNG will allow us to
add a variant of the monad that provides randomness and transparently routes
the state through the forks in the parallel computation, retaining the
model's determinism.

-Ryan
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] why is Random in System?

2011-08-17 Thread Bryan O'Sullivan
On Wed, Aug 17, 2011 at 12:27 PM, Ryan Newton rrnew...@gmail.com wrote:

 The more fundamental problem is that splitting is neither well understood
 nor generally safe, and as such it should not be in the basic Random class.


 Would you mind elaborating?


Certainly. The purpose of splitting a PRNG is to create two new PRNGs, with
the following properties:

   - Long periods
   - The streams generated by each PRNG must be independent
   - Splitting must be fast, while preserving the above two properties

It's very easy to write a split function that gets at least one of these
considerations wrong.


 But I am under the impression that it is well understood by Burton Smith
 and others who have worked on the topic, and that they assure us that using
 AES, RNG's under any series of splits are as strong as those generated in a
 linear sequence.


The trouble is that very few people have worked on this topic - there's
almost no published literature on it. And Burton and his colleagues readily
admit that the technique they suggest is a matter of folk wisdom in the
crypto community.

A typical technical application of a PRNG is for Monte Carlo processes,
where you want to (a) have a very fast PRNG and (b) understand its
properties. Burton's off-the-cuff suggestion is all very well, but we don't
know important things like what the performance or period of that PRNG would
be. For instance, if we don't know a PRNG's period, we don't know when we're
going to start seeing autocorrelations in its output, or if it supports
splitting, how many splits are safe to perform before we start seeing *cross
*-stream correlation. This in turn means that we don't know whether, when,
or why the consumers of our pseudo-random numbers are going to themselves
start producing garbage results, and that's troubling to me.

Could you expound on this also?  The people I know in the parallelism
 community seem to care a lot about deterministic PRNG in parallel programs.


Yep, but don't conflate determinism with splitting. In the imperative world,
you normally know how many CPUs you have, so you initialize one PRNG per
CPU, and simply go from there; there's no need for splitting. In the
parallel community, people are going out of their way to *avoid* splitting.

The only good treatments of this subject that I know are published by the
SPRNG authors at FSU.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] why is Random in System?

2011-08-16 Thread Evan Laforge
I've noticed there's a convention to put modules having to deal with
randomness into System.Random.  I thought System was for OS
interaction?  Granted getting a random seed usually means going to the
OS, but isn't the rest of it, like generating random sequences,
distributions, selecting based on probability, shuffling, etc. all
non-OS related algorithms?

I'm not sure where I would expect Random to go, perhaps Math or maybe
the toplevel, but under System seems, well, random...

I notice random-fu puts it under Data, which is also not where I'd
look, except that you always look in Data because everything goes into
Data... but algorithms dealing with random numbers aren't really data
structures either, are they?

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] why is Random in System?

2011-08-16 Thread Thomas DuBuisson
I think of it as natural for exactly the reason you stated (the data
comes from the OS).  It seems even more natural to me in the entropy
package module 'System.Entropy' as I am accustom to the phrase system
entropy.  Equally, I would fine a 'Network.Entropy' module acceptable
under the assumption it connects to one of the public random number
servers for it's data.

Perhaps a top level Random. should be used, but that too can be
questioned.  For example, when I import the module Random or perhaps
Random.Generators would I get fast prngs?  Cryptographic prngs?
Both?  Something else (both slow and weak, like what we have now ;-)
)?

Cheers,
Thomas

On Tue, Aug 16, 2011 at 1:04 PM, Evan Laforge qdun...@gmail.com wrote:
 I've noticed there's a convention to put modules having to deal with
 randomness into System.Random.  I thought System was for OS
 interaction?  Granted getting a random seed usually means going to the
 OS, but isn't the rest of it, like generating random sequences,
 distributions, selecting based on probability, shuffling, etc. all
 non-OS related algorithms?

 I'm not sure where I would expect Random to go, perhaps Math or maybe
 the toplevel, but under System seems, well, random...

 I notice random-fu puts it under Data, which is also not where I'd
 look, except that you always look in Data because everything goes into
 Data... but algorithms dealing with random numbers aren't really data
 structures either, are they?

 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] why is Random in System?

2011-08-16 Thread James Cook
On Aug 16, 2011, at 4:04 PM, Evan Laforge wrote:

 I've noticed there's a convention to put modules having to deal with
 randomness into System.Random.  I thought System was for OS
 interaction?  Granted getting a random seed usually means going to the
 OS, but isn't the rest of it, like generating random sequences,
 distributions, selecting based on probability, shuffling, etc. all
 non-OS related algorithms?
 
 I'm not sure where I would expect Random to go, perhaps Math or maybe
 the toplevel, but under System seems, well, random...
 
 I notice random-fu puts it under Data, which is also not where I'd
 look, except that you always look in Data because everything goes into
 Data... but algorithms dealing with random numbers aren't really data
 structures either, are they?

System definitely does seem like an odd choice.  In most cases the only 
interaction any PRNG, even when accessed via the FFI, has with the system is 
- as you say - to get an initial seed value for a global instance.

When I wrote random-fu I chose to use Data.Random based on the perspective is 
that a random variable or process _is_ just a mathematical object, and can be 
represented by an abstract data structure.  I'm sure there's a case to be made 
against that view too, and if someone were to present a good argument for 
something better I'd even consider changing it. It seems to me, though, that 
the line between data and not-data is pretty fuzzy in a functional language 
(which is one of the many things that makes them great), and for me it seems 
quite natural to think of random variables as data.  At least, in practice it 
feels a lot more like manipulating data than anything else.  But then I'm one 
of those weirdos who thinks of IO t as just a data structure too.

-- James
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] why is Random in System?

2011-08-16 Thread Brandon Allbery
On Tue, Aug 16, 2011 at 17:07, James Cook mo...@deepbondi.net wrote:

 On Aug 16, 2011, at 4:04 PM, Evan Laforge wrote:
  I've noticed there's a convention to put modules having to deal with
  randomness into System.Random.  I thought System was for OS
  interaction?  Granted getting a random seed usually means going to the
  OS, but isn't the rest of it, like generating random sequences,
  distributions, selecting based on probability, shuffling, etc. all
  non-OS related algorithms?

 System definitely does seem like an odd choice.  In most cases the only
 interaction any PRNG, even when accessed via the FFI, has with the system
 is - as you say - to get an initial seed value for a global instance.


I'd be tempted to guess that the whole reason it's under System is the IO
component.

-- 
brandon s allbery  allber...@gmail.com
wandering unix systems administrator (available) (412) 475-9364 vm/sms
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] why is Random in System?

2011-08-16 Thread Evan Laforge
Yeah, fair enough about getting the seed.  I think I like the idea of
breaking them into System.Entropy and then Random or Data.Random.  It
feels odd to stick pure algorithm packages, which simply accept a
random seed or stream from elsewhere, under System.Random.

There are a fair number of alternate implementations on hackage which
suggests people aren't satisfied with the stdlib one but haven't quite
settled down on a standard alternative.

James Cook, good point about Data.  I suppose that's also why it seems
to be the catch all for everything in haskell.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] why is Random in System?

2011-08-16 Thread Niklas Larsson
I don't like the idea of Data.Random because random numbers use
ordinary number types, and the generator itself is not the object of
interest, the numbers are. I'd much prefer Math.Random. As the Math
prefix isn't used in the core libraries maybe Control.Random is the
least unpalatable alternative.

Regards,
Niklas

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe