Re: [Haskell-cafe] why is Random in System?
Yep, but don't conflate determinism with splitting. In the imperative world, you normally know how many CPUs you have, so you initialize one PRNG per CPU, and simply go from there; there's no need for splitting. In the parallel community, people are going out of their way to *avoid* splitting. I'm having trouble thinking of scenarios where per-CPU does the trick. Do you mean one per pthread rather than one per CPU? In the Cilk case, you've got to deal with work stealing of course. So you want rand() to generate a result that is determined by the current stack-frame's position in the binary-tree of spawns. In the work I was referring to: http://groups.csail.mit.edu/sct/wiki/index.php?title=Other_Projects#Deterministic_Parallel_Random-Number_Generation ... they try a bunch of different methods and I can't remember if any of them split the RNG eagerly as they go down the spawn tree or if they just record the tree-index on the way down and then read it out when they generated randoms. (I think the latter.) Cheers, -Ryan ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] why is Random in System?
Brandon Allbery allber...@gmail.com wrote: I've noticed there's a convention to put modules having to deal with randomness into System.Random. I thought System was for OS interaction? Granted getting a random seed usually means going to the OS, but isn't the rest of it, like generating random sequences, distributions, selecting based on probability, shuffling, etc. all non-OS related algorithms? System definitely does seem like an odd choice. In most cases the only interaction any PRNG, even when accessed via the FFI, has with the system is - as you say - to get an initial seed value for a global instance. I'd be tempted to guess that the whole reason it's under System is the IO component. That's not really valid, is it? After all the new 'time' package is also stationed under the Data tree, and it has a similarly large IO component. I have to say, it seems very intuitive to me to look for it under Data, even though I'm not sure why. Probably I'm just used to it. Time has a strong connection to the operating system and the hardware, so it could just as well go into the System tree. For (non-cryptographic) randomness however we are dealing with numerical data, for which the connection to the system is mere convenience, so I wouldn't mind finding it under Data at all. Greets, Ertugrul -- nightmare = unsafePerformIO (getWrongWife = sex) http://ertes.de/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] why is Random in System?
Hi all, I'm the maintainer of random. If people could decide on what the alternative name would be we could put it through the library proposal process. It seems that one problem at this moment is the lack of a single, clear right answer. Replacing one debatable not-quite-right choice with another may not be satisfying ;-). Also, what Thomas says is right. The current implementation is SLOW and WEAK, which would not seem to make a good default implementation. The goal is to replace it with something better so that the default random package is strong in at least one dimension. I think this is important because I imagine many people use the default package, for example because they don't want to scour hackage and try all the alternatives. My proposal for this has been to use AES based crypto-prng. I think that is fast enough (i.e. faster than what's currently there), very strong, and splittable. New Intel and AMD hardware has hardware support for AES which makes it even faster. The intel-aes package provides this functionality, with and without hardware support. But there's work left to do in terms of testing, making sure its cross platform, etc. Anyone who's interested in helping (especially with Windows support) would be warmly welcomed! Cheers, -Ryan On Wed, Aug 17, 2011 at 4:46 AM, Ertugrul Soeylemez e...@ertes.de wrote: Brandon Allbery allber...@gmail.com wrote: I've noticed there's a convention to put modules having to deal with randomness into System.Random. I thought System was for OS interaction? Granted getting a random seed usually means going to the OS, but isn't the rest of it, like generating random sequences, distributions, selecting based on probability, shuffling, etc. all non-OS related algorithms? System definitely does seem like an odd choice. In most cases the only interaction any PRNG, even when accessed via the FFI, has with the system is - as you say - to get an initial seed value for a global instance. I'd be tempted to guess that the whole reason it's under System is the IO component. That's not really valid, is it? After all the new 'time' package is also stationed under the Data tree, and it has a similarly large IO component. I have to say, it seems very intuitive to me to look for it under Data, even though I'm not sure why. Probably I'm just used to it. Time has a strong connection to the operating system and the hardware, so it could just as well go into the System tree. For (non-cryptographic) randomness however we are dealing with numerical data, for which the connection to the system is mere convenience, so I wouldn't mind finding it under Data at all. Greets, Ertugrul -- nightmare = unsafePerformIO (getWrongWife = sex) http://ertes.de/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] why is Random in System?
Ryan Newton rrnew...@gmail.com wrote: I'm the maintainer of random. If people could decide on what the alternative name would be we could put it through the library proposal process. It seems that one problem at this moment is the lack of a single, clear right answer. Replacing one debatable not-quite-right choice with another may not be satisfying ;-). Also, what Thomas says is right. The current implementation is SLOW and WEAK, which would not seem to make a good default implementation. The goal is to replace it with something better so that the default random package is strong in at least one dimension. I think this is important because I imagine many people use the default package, for example because they don't want to scour hackage and try all the alternatives. My proposal for this has been to use AES based crypto-prng. I think that is fast enough (i.e. faster than what's currently there), very strong, and splittable. New Intel and AMD hardware has hardware support for AES which makes it even faster. The intel-aes package provides this functionality, with and without hardware support. But there's work left to do in terms of testing, making sure its cross platform, etc. Anyone who's interested in helping (especially with Windows support) would be warmly welcomed! Using a cryptographically strong random number generator here is probably a very bad idea. Two reasons: Firstly while being faster than the current implementation an AES-based implementation will still be considerably slower than the Mersenne Twister algorithm. This may or may not be true, if hardware AES support is there, but don't just assume that everybody has AES instructions now. For example I don't have them. Secondly there is no standard requiring that the default random number generator is cryptographically safe. Changing this particular implementation, which is the one most people use, to a CSPRNG will make people take for granted that System.Random is safe to use in security-related products, because it would be very convenient. This will render strong security products trivially weak, when compiled with the wrong Haskell distribution, and you will find packages with statements like: We assume that you use Ryan Newton's distribution of the random package. I would rather propose the Mersenne Twister as the default random number generator. You could add AES as a secondary generator for people requiring cryptographic strength, but then do it properly, i.e. impure, because most people, when reading about a PRNG with AES anywhere in its name, will just assume that it's a CSPRNG. Greets, Ertugrul -- nightmare = unsafePerformIO (getWrongWife = sex) http://ertes.de/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] why is Random in System?
On Wed, Aug 17, 2011 at 8:56 AM, Ryan Newton rrnew...@gmail.com wrote: I'm the maintainer of random. If people could decide on what the alternative name would be we could put it through the library proposal process. It seems that one problem at this moment is the lack of a single, clear right answer. Replacing one debatable not-quite-right choice with another may not be satisfying ;-). The entire premise of that discussion seems quite ridiculous. All that renaming the modules will achieve is the breakage of several hundred packages. The roots of the module naming hierarchy - Control, Data, System, and so on - are so muddled and inconsistently used that the best advice you could give to people who raise this topic is to pretend those roots are simply not there. My proposal for this has been to use AES based crypto-prng. We'd be better off if you could seek consensus from PRNG maintainers on a fixed-up Random class before attacking this problem, so that we'd have a better chance of achieving cross-package compatibility. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] why is Random in System?
The problem with Mersenne twister is that it doesn't split well. The main reason for crypto prng in this package would not be to advertise to people that System.Random can be used for security-related apps *but to make splitting reasonably safe*. It's not good enough to have a known-bad generator under splitting provided as the default. And I think we need splitting, especially as more Haskell programs become parallel. Would it address your concerns to not mention the crypto nature of the standard implementation in the System.Random documentation? I think there's also a reasonable argument to lean towards *correctness* over performance in Haskell's defaults. For example, unconstrained Num bindings default to Integer and likewise random numbers could be as strong as possible by default, and those looking for raw rands/sec throughput could make other informed choices. I had thought that maybe we could bifurcate the stdgen concept into a fast and a strong version, which could be say Mersenne Twister and AES respectively. But, again, the problem comes if the fast version is expected to be splittable as well. With SplittableGen factored out from RandomGen I suppose it would be possible for the fast version to NOT offer splitting. However, Mersenne Twister is best used with an imperative interface, you can see the tension in the pure version of the mersenne package on hackage: http://hackage.haskell.org/package/mersenne-random-pure64-0.2.0.3 Please also see Burton Smith's comments below in response to my proposal to offer a MT + AES combination. Best, -Ryan -- Forwarded message -- From: Burton Smith burt...@microsoft.com Date: Tue, Jun 28, 2011 at 1:28 PM Subject: RE: AESNI-based splittable random number generation for Haskell To: Newton, Ryan R ryan.r.new...@intel.com Mersenne Twister (MT)is a poor choice in my opinion. First, the generator state is large (2496 bytes) and it must be copied on each call to next. Split is worse; it will generate twice as many bytes per call as next will. Second, I see no good way to guarantee independence of the two generators emanating from a split. MT is hard to initialize anyway, and giant-stepping it to define the newly split generator (as we did back in the 80's paper) is not only hard for an LFSR like MT but, worse yet, it doesn't work for Haskell or other fine-grain concurrent languages because split and next will commute. Other tree RNGs, e.g. SPRNG, have the same commutativity issue. Block ciphers address this issue head-on by reducing the split independence problem to a crypto problem. A better choice might be some block cipher other than AES. Two possibilities are XTEA and RC4. Both are in Wikipedia. RC4 has 256 bytes of key state, still bigger than I would like. Another scheme is to make the number of rounds an option. With AESNI, this could scream. Burton On Wed, Aug 17, 2011 at 12:26 PM, Ertugrul Soeylemez e...@ertes.de wrote: Ryan Newton rrnew...@gmail.com wrote: Using a cryptographically strong random number generator here is probably a very bad idea. Two reasons: Firstly while being faster than the current implementation an AES-based implementation will still be considerably slower than the Mersenne Twister algorithm. This may or may not be true, if hardware AES support is there, but don't just assume that everybody has AES instructions now. For example I don't have them. Secondly there is no standard requiring that the default random number generator is cryptographically safe. Changing this particular implementation, which is the one most people use, to a CSPRNG will make people take for granted that System.Random is safe to use in security-related products, because it would be very convenient. This will render strong security products trivially weak, when compiled with the wrong Haskell distribution, and you will find packages with statements like: We assume that you use Ryan Newton's distribution of the random package. I would rather propose the Mersenne Twister as the default random number generator. You could add AES as a secondary generator for people requiring cryptographic strength, but then do it properly, i.e. impure, because most people, when reading about a PRNG with AES anywhere in its name, will just assume that it's a CSPRNG. Greets, Ertugrul -- nightmare = unsafePerformIO (getWrongWife = sex) http://ertes.de/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] why is Random in System?
On Wed, Aug 17, 2011 at 11:10 AM, Ryan Newton rrnew...@gmail.com wrote: The problem with Mersenne twister is that it doesn't split well. The main reason for crypto prng in this package would not be to advertise to people that System.Random can be used for security-related apps *but to make splitting reasonably safe*. The more fundamental problem is that splitting is neither well understood nor generally safe, and as such it should not be in the basic Random class. A more sensible API would have a Random class that lacks a split operation, and a SplittableRandom class that permits it, as you mention later in your message. Most current PRNGs could then be instances of Random, but not SplittableRandom. And I think we need splitting, especially as more Haskell programs become parallel. I do not agree here, I'm afraid. By the way, my mwc-random package is at least as fast as mersenne-twister, has smaller state, and is pure Haskell. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] why is Random in System?
The more fundamental problem is that splitting is neither well understood nor generally safe, and as such it should not be in the basic Random class. Would you mind elaborating? Splitting was not well-understood by the original authors of System.Random; that much is in the comments. Nor is it well understood by me. But I am under the impression that it is well understood by Burton Smith and others who have worked on the topic, and that they assure us that using AES, RNG's under any series of splits are as strong as those generated in a linear sequence. (And if you show otherwise, you have a crypto paper and quite a name for yourself.) And I think we need splitting, especially as more Haskell programs become parallel. I do not agree here, I'm afraid. Could you expound on this also? The people I know in the parallelism community seem to care a lot about deterministic PRNG in parallel programs. For example, the Cilk folks at MIT and Intel who I work with are *modifying their runtime system *just to get deterministic parallel PRNG. For example our in our Monad Par package splittable RNG will allow us to add a variant of the monad that provides randomness and transparently routes the state through the forks in the parallel computation, retaining the model's determinism. -Ryan ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] why is Random in System?
On Wed, Aug 17, 2011 at 12:27 PM, Ryan Newton rrnew...@gmail.com wrote: The more fundamental problem is that splitting is neither well understood nor generally safe, and as such it should not be in the basic Random class. Would you mind elaborating? Certainly. The purpose of splitting a PRNG is to create two new PRNGs, with the following properties: - Long periods - The streams generated by each PRNG must be independent - Splitting must be fast, while preserving the above two properties It's very easy to write a split function that gets at least one of these considerations wrong. But I am under the impression that it is well understood by Burton Smith and others who have worked on the topic, and that they assure us that using AES, RNG's under any series of splits are as strong as those generated in a linear sequence. The trouble is that very few people have worked on this topic - there's almost no published literature on it. And Burton and his colleagues readily admit that the technique they suggest is a matter of folk wisdom in the crypto community. A typical technical application of a PRNG is for Monte Carlo processes, where you want to (a) have a very fast PRNG and (b) understand its properties. Burton's off-the-cuff suggestion is all very well, but we don't know important things like what the performance or period of that PRNG would be. For instance, if we don't know a PRNG's period, we don't know when we're going to start seeing autocorrelations in its output, or if it supports splitting, how many splits are safe to perform before we start seeing *cross *-stream correlation. This in turn means that we don't know whether, when, or why the consumers of our pseudo-random numbers are going to themselves start producing garbage results, and that's troubling to me. Could you expound on this also? The people I know in the parallelism community seem to care a lot about deterministic PRNG in parallel programs. Yep, but don't conflate determinism with splitting. In the imperative world, you normally know how many CPUs you have, so you initialize one PRNG per CPU, and simply go from there; there's no need for splitting. In the parallel community, people are going out of their way to *avoid* splitting. The only good treatments of this subject that I know are published by the SPRNG authors at FSU. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] why is Random in System?
I've noticed there's a convention to put modules having to deal with randomness into System.Random. I thought System was for OS interaction? Granted getting a random seed usually means going to the OS, but isn't the rest of it, like generating random sequences, distributions, selecting based on probability, shuffling, etc. all non-OS related algorithms? I'm not sure where I would expect Random to go, perhaps Math or maybe the toplevel, but under System seems, well, random... I notice random-fu puts it under Data, which is also not where I'd look, except that you always look in Data because everything goes into Data... but algorithms dealing with random numbers aren't really data structures either, are they? ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] why is Random in System?
I think of it as natural for exactly the reason you stated (the data comes from the OS). It seems even more natural to me in the entropy package module 'System.Entropy' as I am accustom to the phrase system entropy. Equally, I would fine a 'Network.Entropy' module acceptable under the assumption it connects to one of the public random number servers for it's data. Perhaps a top level Random. should be used, but that too can be questioned. For example, when I import the module Random or perhaps Random.Generators would I get fast prngs? Cryptographic prngs? Both? Something else (both slow and weak, like what we have now ;-) )? Cheers, Thomas On Tue, Aug 16, 2011 at 1:04 PM, Evan Laforge qdun...@gmail.com wrote: I've noticed there's a convention to put modules having to deal with randomness into System.Random. I thought System was for OS interaction? Granted getting a random seed usually means going to the OS, but isn't the rest of it, like generating random sequences, distributions, selecting based on probability, shuffling, etc. all non-OS related algorithms? I'm not sure where I would expect Random to go, perhaps Math or maybe the toplevel, but under System seems, well, random... I notice random-fu puts it under Data, which is also not where I'd look, except that you always look in Data because everything goes into Data... but algorithms dealing with random numbers aren't really data structures either, are they? ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] why is Random in System?
On Aug 16, 2011, at 4:04 PM, Evan Laforge wrote: I've noticed there's a convention to put modules having to deal with randomness into System.Random. I thought System was for OS interaction? Granted getting a random seed usually means going to the OS, but isn't the rest of it, like generating random sequences, distributions, selecting based on probability, shuffling, etc. all non-OS related algorithms? I'm not sure where I would expect Random to go, perhaps Math or maybe the toplevel, but under System seems, well, random... I notice random-fu puts it under Data, which is also not where I'd look, except that you always look in Data because everything goes into Data... but algorithms dealing with random numbers aren't really data structures either, are they? System definitely does seem like an odd choice. In most cases the only interaction any PRNG, even when accessed via the FFI, has with the system is - as you say - to get an initial seed value for a global instance. When I wrote random-fu I chose to use Data.Random based on the perspective is that a random variable or process _is_ just a mathematical object, and can be represented by an abstract data structure. I'm sure there's a case to be made against that view too, and if someone were to present a good argument for something better I'd even consider changing it. It seems to me, though, that the line between data and not-data is pretty fuzzy in a functional language (which is one of the many things that makes them great), and for me it seems quite natural to think of random variables as data. At least, in practice it feels a lot more like manipulating data than anything else. But then I'm one of those weirdos who thinks of IO t as just a data structure too. -- James ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] why is Random in System?
On Tue, Aug 16, 2011 at 17:07, James Cook mo...@deepbondi.net wrote: On Aug 16, 2011, at 4:04 PM, Evan Laforge wrote: I've noticed there's a convention to put modules having to deal with randomness into System.Random. I thought System was for OS interaction? Granted getting a random seed usually means going to the OS, but isn't the rest of it, like generating random sequences, distributions, selecting based on probability, shuffling, etc. all non-OS related algorithms? System definitely does seem like an odd choice. In most cases the only interaction any PRNG, even when accessed via the FFI, has with the system is - as you say - to get an initial seed value for a global instance. I'd be tempted to guess that the whole reason it's under System is the IO component. -- brandon s allbery allber...@gmail.com wandering unix systems administrator (available) (412) 475-9364 vm/sms ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] why is Random in System?
Yeah, fair enough about getting the seed. I think I like the idea of breaking them into System.Entropy and then Random or Data.Random. It feels odd to stick pure algorithm packages, which simply accept a random seed or stream from elsewhere, under System.Random. There are a fair number of alternate implementations on hackage which suggests people aren't satisfied with the stdlib one but haven't quite settled down on a standard alternative. James Cook, good point about Data. I suppose that's also why it seems to be the catch all for everything in haskell. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] why is Random in System?
I don't like the idea of Data.Random because random numbers use ordinary number types, and the generator itself is not the object of interest, the numbers are. I'd much prefer Math.Random. As the Math prefix isn't used in the core libraries maybe Control.Random is the least unpalatable alternative. Regards, Niklas ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe