Re: [swift-evolution] [Proposal] Random Unification

Jonathan Hull via swift-evolution Wed, 04 Oct 2017 02:49:48 -0700

@Xiaodi:  What do you think of the possibility of trapping in cases of low 
entropy, and adding an additional global function that checks for entropy so 
that conscientious programmers can avoid the trap and provide an alternative 
(or error message)?


Thanks,
Jon

> On Oct 4, 2017, at 2:41 AM, Xiaodi Wu <xiaodi...@gmail.com> wrote:
> 
> 
> On Wed, Oct 4, 2017 at 02:39 Félix Cloutier <felixclout...@icloud.com 
> <mailto:felixclout...@icloud.com>> wrote:
> I'm really not enthusiastic about `random() -> Self?` or `random() throws -> 
> Self` when the only possible error is that some global object hasn't been 
> initialized.
> 
> The idea of having `random` straight on integers and floats and collections 
> was to provide a simple interface, but using a global CSPRNG for those 
> operations comes at a significant usability cost. I think that something has 
> to go:
> 
> Drop the random methods on FixedWidthInteger, FloatingPoint
> ...or drop the CSPRNG as a default
> Drop the optional/throws, and trap on error
> 
> I know I wouldn't use the `Int.random()` method if I had to unwrap every 
> single result, when getting one non-nil result guarantees that the program 
> won't see any other nil result again until it restarts.
> 
> From the perspective of an app that can be suspended and resumed at any time, 
> “until it restarts” could be as soon as the next invocation of 
> `Int.random()`, could it not?
> 
> 
> Félix
> 
>> Le 3 oct. 2017 à 23:44, Jonathan Hull <jh...@gbis.com 
>> <mailto:jh...@gbis.com>> a écrit :
>> 
>> I like the idea of splitting it into 2 separate “Random” proposals.
>> 
>> The first would have Xiaodi’s built-in CSPRNG which only has the interface:
>> 
>> On FixedWidthInteger:
>>      static func random()throws -> Self
>>      static func random(in range: ClosedRange<Self>)throws -> Self
>> 
>> On Double:
>>      static func random()throws -> Double
>>      static func random(in range: ClosedRange<Double>)throws -> Double
>> 
>> (Everything else we want, like shuffled(), could be built in later proposals 
>> by calling those functions)
>> 
>> The other option would be to remove the ‘throws’ from the above functions 
>> (perhaps fatalError-ing), and provide an additional function which can be 
>> used to check that there is enough entropy (so as to avoid the crash or fall 
>> back to a worse source when the CSPRNG is unavailable).
>> 
>> 
>> 
>> Then a second proposal would bring in the concept of RandomSources (whatever 
>> we call them), which can return however many random bytes you ask for… and a 
>> protocol for types which know how to initialize themselves from those bytes. 
>>  That might be spelled like 'static func random(using: RandomSource)->Self'. 
>>  As a convenience, the source would also be able to create 
>> FixedWidthIntegers and Doubles (both with and without a range), and would 
>> also have the coinFlip() and oneIn(UInt)->Bool functions. Most types should 
>> be able to build themselves off of that.  There would be a default source 
>> which is built from the first protocol.
>> 
>> I also really think we should have a concept of Repeatably-Random as a 
>> subprotocol for the second proposal.  I see far too many shipping apps which 
>> have bugs due to using arc4Random when they really needed a repeatable 
>> source (e.g. patterns and lines jump around when you resize things). If it 
>> was an easy option, people would use it when appropriate. This would just 
>> mean a sub-protocol which has an initializer which takes a seed, and the 
>> ability to save/restore state (similar to CGContexts).
>> 
>> The second proposal would also include things like shuffled() and 
>> shuffled(using:).
>> 
>> Thanks,
>> Jon
>> 
>> 
>> 
>>> On Oct 3, 2017, at 9:31 PM, Alejandro Alonso <aalonso...@outlook.com 
>>> <mailto:aalonso...@outlook.com>> wrote:
>>> 
>>> I really like the schedule here. After reading for a while, I do agree with 
>>> Brent that stdlib should very primitive in functionality that it provides. 
>>> I also agree that the most important part right now is designing the 
>>> internal crypto on which the numeric types use to return their respected 
>>> random number. On the discussion of how we should handle not enough entropy 
>>> with the device random, from a users perspective it makes sense that 
>>> calling .random should just give me a random number, but from a developers 
>>> perspective I see Optional being the best choice here. While I think 
>>> blocking could, in most cases, provide the user an easier API, we have to 
>>> do this right and be safe here by providing a value that indicates that 
>>> there is room for error here. As for the generator abstraction, I believe 
>>> there should be a bare basic protocol that sets a layout for new generators 
>>> and should be focusing on its requirements. 
>>> 
>>> Whether or not RandomAccessCollection and MutableCollection should get 
>>> .random and .shuffle/.shuffled in this first proposal is completely up in 
>>> the air for me. It makes sense, to me, to include the .random in this 
>>> proposal and open another one .shuffle/.shuffled, but I can see arguments 
>>> that should say we create something separate for these two, or include all 
>>> of it in this proposal.
>>> 
>>> - Alejandro
>>> 
>>> On Sep 27, 2017, 7:29 PM -0500, Xiaodi Wu <xiaodi...@gmail.com 
>>> <mailto:xiaodi...@gmail.com>>, wrote:
>>>> 
>>>> On Wed, Sep 27, 2017 at 00:18 Félix Cloutier <felixclout...@icloud.com 
>>>> <mailto:felixclout...@icloud.com>> wrote:
>>>>> Le 26 sept. 2017 à 16:14, Xiaodi Wu <xiaodi...@gmail.com 
>>>>> <mailto:xiaodi...@gmail.com>> a écrit :
>>>>> 
>>>> 
>>>>> On Tue, Sep 26, 2017 at 11:26 AM, Félix Cloutier 
>>>>> <felixclout...@icloud.com <mailto:felixclout...@icloud.com>> wrote:
>>>>> 
>>>>> It's possible to use a CSPRNG-grade algorithm and seed it once to get a 
>>>>> reproducible sequence, but when you use it as a CSPRNG, you typically 
>>>>> feed entropy back into it at nondeterministic points to ensure that even 
>>>>> if you started with a bad seed, you'll eventually get to an alright 
>>>>> state. Unless you keep track of when entropy was mixed in and what the 
>>>>> values were, you'll never get a reproducible CSPRNG.
>>>>> 
>>>>> We would give developers a false sense of security if we provided them 
>>>>> with CSPRNG-grade algorithms that we called CSPRNGs and that they could 
>>>>> seed themselves. Just because it says "crypto-secure" in the name doesn't 
>>>>> mean that it'll be crypto-secure if it's seeded with time(). Therefore, 
>>>>> "reproducible" vs "non-reproducible" looks like a good distinction to me.
>>>>> 
>>>>> I disagree here, in two respects:
>>>>> 
>>>>> First, whether or not a particular PRNG is cryptographically secure is an 
>>>>> intrinsic property of the algorithm; whether it's "reproducible" or not 
>>>>> is determined by the published API. In other words, the distinction 
>>>>> between CSPRNG vs. non-CSPRNG is important to document because it's 
>>>>> semantics that cannot be deduced by the user otherwise, and it is an 
>>>>> important one for writing secure code because it tells you whether an 
>>>>> attacker can predict future outputs based only on observing past outputs. 
>>>>> "Reproducible" in the sense of seedable or not is trivially noted by 
>>>>> inspection of the published API, and it is rather immaterial to writing 
>>>>> secure code.
>>>> 
>>>> 
>>>> Cryptographically secure is not a property that I'm comfortable applying 
>>>> to an algorithm. You cannot say that you've made a cryptographically 
>>>> secure thing just because you've used all the right algorithms: you also 
>>>> have to use them right, and one of the most critical components of a 
>>>> cryptographically secure PRNG is its seed.
>>>> 
>>>> A cryptographically secure algorithm isn’t sufficient, but it is 
>>>> necessary. That’s why it’s important to mark them as such. If I'm a 
>>>> careful developer, then it is absolutely important to me to know that I’m 
>>>> using a PRNG with a cryptographically secure algorithm, and that the 
>>>> particular implementation of that algorithm is correct and secure.
>>>> 
>>>> It is a *feature* of a lot of modern CSPRNGs that you can't seed them:
>>>> 
>>>> You cannot seed or add entropy to std::random_device
>>>> 
>>>> Although std::random_device may in practice be backed by a software 
>>>> CSPRNG, IIUC, the intention is that it can provide access to a hardware 
>>>> non-deterministic source when available.
>>>> 
>>>> You cannot seed or add entropy to CryptGenRandom
>>>> You can only add entropy to /dev/(u)random
>>>> You can only add entropy to BSD's arc4random
>>>> 
>>>> Ah, I see. I think we mean different things when we say PRNG. A PRNG is an 
>>>> entirely deterministic algorithm; the output is non-random and the 
>>>> algorithm itself requires no entropy. If a PRNG is seeded with a random 
>>>> sequence of bits, its output can "appear" to be random. A CSPRNG is a PRNG 
>>>> that fulfills certain criteria such that its output can be appropriate for 
>>>> use in cryptographic applications in place of a truly random sequence *if* 
>>>> the input to the CSPRNG is itself random.
>>>> 
>>>> The examples you give above *incorporate* a CSPRNG, environment entropy, 
>>>> and a set of rules about when to mix in additional entropy in order to 
>>>> produce output indistinguishable from a random sequence, but they are 
>>>> *not* themselves really *pseudorandom* generators because they are not 
>>>> deterministic. Not only do such sources of random numbers not require an 
>>>> interface to allow seeding, they do not even have to be publicly 
>>>> instantiable: Swift need only expose a single thread-safe instance (or an 
>>>> instance per thread) of a single type that provides access to 
>>>> CryptGenRandom/urandom/arc4random, since after all the output of multiple 
>>>> instances of that type should be statistically indistinguishable from the 
>>>> output of only one.
>>>> 
>>>> What I was trying to respond to, by contrast, is the design of a hierarchy 
>>>> of protocols CSPRNG : PRNG (or, in Alejandro's proposal, 
>>>> UnsafeRandomSource : RandomSource) and the appropriate APIs to expose on 
>>>> each. This is entirely inapplicable to your examples. It stands to reason 
>>>> that a non-instantiable source of random numbers does not require a 
>>>> protocol of its own (a hypothetical RNG : CSPRNG), since there is no 
>>>> reason to implement (if done correctly) more than a single publicly 
>>>> non-instantiable singleton type that could conform to it. For that matter, 
>>>> the concrete type itself probably doesn't need *any* public API at all. 
>>>> Instead, extensions to standard library types such as Int that implement 
>>>> conformance to the protocol that Alejandro names "Randomizable" could call 
>>>> internal APIs to provide all the necessary functionality, and third-party 
>>>> types that need to conform to "Randomizable" could then in turn use 
>>>> `Int.random()` or `Double.random()` to implement their own conformance. In 
>>>> fact, the concrete random number generator type doesn't need to be public 
>>>> at all. All public interaction could be through APIs such as 
>>>> `Int.random()`.
>>>> 
>>>> 
>>>> Just because we can expose a seed interface doesn't mean we should, and in 
>>>> this case I believe that it would go against the prime objective of 
>>>> providing secure random numbers.
>>>> 
>>>> 
>>>> If we're talking about a Swift interface to a non-deterministic source of 
>>>> random numbers like urandom or arc4random, then, as I write above, not 
>>>> only do I agree that it doesn't need to be seedable, it also does not need 
>>>> to be instantiable at all, does not need to conform to a protocol that 
>>>> specifically requires the semantics of a non-deterministic source, does 
>>>> not need to expose any public interface whatsoever, and doesn't itself 
>>>> even need to be public. (Does it even need to be a type, as opposed to 
>>>> simply a free function?)
>>>> 
>>>> In fact, having reasoned through all of this, we can split the design task 
>>>> into two. The most essential part, which definitely should be part of the 
>>>> stdlib, would be an internal interface to a cryptographically secure 
>>>> platform-specific entropy source, a public protocol named something like 
>>>> Randomizable (to be bikeshedded), and the appropriate implementations on 
>>>> Boolean, binary integer, and floating point types to conform them to 
>>>> Randomizable so that users can write `Bool.random()` or `Int.random()`. 
>>>> The second part, which can be a separate proposal or even a standalone 
>>>> core library or third-party library, would be the protocols and concrete 
>>>> types that implement pseudorandom number generators, allowing for 
>>>> reproducible pseudorandom sequences. In other words, instead of PRNGs and 
>>>> CSPRNGs being the primitives on which `Int.random()` is implemented; 
>>>> `Int.random()` should be the standard library primitive which allows PRNGs 
>>>> and CSPRNGs to be seeded.
>>>>> If your attacker can observe your seeding once, chances are that they can 
>>>>> observe your reseeding too; then, they can use their own implementation 
>>>>> of the PRNG (whether CSPRNG or non-CSPRNG) and reproduce your 
>>>>> pseudorandom sequence whether or not Swift exposes any particular API.
>>>> 
>>>> On Linux, the random devices are initially seeded with machine-specific 
>>>> but rather invariant data that makes /dev/urandom spit out predictable 
>>>> numbers. It is considered "seeded" after a root process writes POOL_SIZE 
>>>> bytes to it. On most implementations, this initial seed is stored on disk: 
>>>> when the computer shuts down, it reads POOL_SIZE bytes from /dev/urandom 
>>>> and saves it in a file, and the contents of that file is loaded back into 
>>>> /dev/urandom when the computer starts. A scenario where someone can read 
>>>> that file is certainly not less likely than a scenario where /dev/urandom 
>>>> was deleted. That doesn't mean that they have kernel code execution or 
>>>> that they can pry into your process, but they have a good shot at guessing 
>>>> your seed and subsequent RNG results if no stirring happens.
>>>> 
>>>> Sorry, I don't understand what you're getting at here. Again, I'm talking 
>>>> about deterministic algorithms, not non-deterministic sources of random 
>>>> numbers.
>>>> 
>>>>> Secondly, I see no reason to justify the notion that, simply because a 
>>>>> PRNG is cryptographically secure, we ought to hide the seeding 
>>>>> initializer (because one has to exist internally anyway) from the public. 
>>>>> Obviously, one use case for a deterministic PRNG is to get reproducible 
>>>>> sequences of random-appearing values; this can be useful whether the 
>>>>> underlying algorithm is cryptographically secure or not. There are 
>>>>> innumerably many ways to use data generated from a CSPRNG in 
>>>>> non-cryptographically secure ways and omitting or including a public 
>>>>> seeding initializer does not change that; in other words, using a 
>>>>> deterministic seed for a CSPRNG would be a bad idea in certain 
>>>>> applications, but it's a deliberate act, and someone who would mistakenly 
>>>>> do that is clearly incapable of *using* the output from the PRNG in a 
>>>>> secure way either; put a third way, you would be hard pressed to find a 
>>>>> situation where it's true that "if only Swift had not made the seeding 
>>>>> initializer public, this author would have written secure code, but 
>>>>> instead the only security hole that existed in the code was caused by the 
>>>>> availability of a public seeding initializer mistakenly used." The point 
>>>>> of having both explicitly instantiable PRNGs and a layer of simpler APIs 
>>>>> like "Int.random()" is so that the less experienced user can get the 
>>>>> "right thing" by default, and the experienced user can customize the 
>>>>> behavior; any user that instantiates his or her own ChaCha20Random 
>>>>> instance is already calling for the power user interface; it is 
>>>>> reasonable to expose the underlying primitive operations (such as 
>>>>> seeding) so long as there are legitimate uses for it.
>>>> 
>>>> Nothing prevents us from using the same algorithm for a CSPRNG that is 
>>>> safely pre-seeded and a PRNG that people seed themselves, mind you. 
>>>> However, especially when it comes to security, there is a strong 
>>>> responsibility to drive developers into a pit of success: the most obvious 
>>>> thing to do has to be the right one, and suggesting to 
>>>> cryptographically-unaware developers that they have everything they need 
>>>> to manage their own seed is not a step in that direction.
>>>> 
>>>> I'm not opposed to a ChaCha20Random type; I'm opposed to explicitly 
>>>> calling it cryptographically-secure, because it is not unless you know 
>>>> what to do with it. It is emphatically not far-fetched to imagine a 
>>>> developer who thinks that they can outdo the standard library by using 
>>>> their own ChaCha20Random instance after it's been seeded with time() if we 
>>>> let them know that it's "cryptographically secure". If you're a power user 
>>>> and you don't like the default, known-good CSPRNG, then you're hopefully 
>>>> good enough to know that ChaCha20 is considered a cryptographically-secure 
>>>> algorithm without help labels from the language, and you know how to 
>>>> operate it.
>>>> 
>>>>> I'm fully aware of the myths surrounding /dev/urandom and /dev/random. 
>>>>> /dev/urandom might never run out, but it is also possible for it not to 
>>>>> be initialized at all, as in the case of some VM setups. In some older 
>>>>> versions of iOS, /dev/[u]random is reportedly sandboxed out. On systems 
>>>>> where it is available, it can also be deleted, since it is a file. The 
>>>>> point is, all of these scenarios cause an error during seeding of a 
>>>>> CSPRNG. The question is, how to proceed in the face of inability to 
>>>>> access entropy. We must do something, because we cannot therefore return 
>>>>> a cryptographically secure answer. Rare trapping on invocation of 
>>>>> Int.random() or permanently waiting for a never-to-be-initialized 
>>>>> /dev/urandom would be terrible to debug, but returning an optional or 
>>>>> throwing all the time would be verbose. How to design this API?
>>>> 
>>>> If the only concern is that the system might not be initialized enough, 
>>>> I'd say that whatever returns an instance of a global, framework-seeded 
>>>> CSPRNG should return an Optional, and the random methods that use the 
>>>> global CSPRNG can trap and scream that the system is not initialized 
>>>> enough. If this is a likely error for you, you can check if the CSPRNG 
>>>> exists or not before jumping.
>>>> 
>>>> Also note that there is only one system for which Swift is officially 
>>>> distributed (Ubuntu 14.04) on which the only way to get entropy from the 
>>>> OS is to open a random device and read from it.
>>>> 
>>>> Again, I'm not only talking about urandom. As far as I'm aware, every API 
>>>> to retrieve cryptographically secure sequences of random bits on every 
>>>> platform for which Swift is distributed can potentially return an error 
>>>> instead of random bits. The question is, what design for our API is the 
>>>> most sensible way to deal with this contingency? On rethinking, I do 
>>>> believe that consistently returning an Optional is the best way to go 
>>>> about it, allowing the user to either (a) supply a deterministic fallback; 
>>>> (b) raise an error of their own choosing; or (c) trap--all with a minimum 
>>>> of fuss. This seems very Swifty to me.
>>>>  
>>>> 
>>>>>> * What should the default CSPRNG be? There are good arguments for using 
>>>>>> a cryptographically secure device random. (In my proposed 
>>>>>> implementation, for device random, I use Security.framework on Apple 
>>>>>> platforms (because /dev/urandom is not guaranteed to be available due to 
>>>>>> the sandbox, IIUC). On Linux platforms, I would prefer to use 
>>>>>> getrandom() and avoid using file system APIs, but getrandom() is new and 
>>>>>> unsupported on some versions of Ubuntu that Swift supports. This is an 
>>>>>> issue in and of itself.) Now, a number of these facilities strictly 
>>>>>> limit or do not guarantee availability of more than a small number of 
>>>>>> random bytes at a time; they are recommended for seeding other PRNGs but 
>>>>>> *not* as a routine source of random numbers. Therefore, although device 
>>>>>> random should be available to users, it probably shouldn’t be the 
>>>>>> default for the Swift standard library as it could have negative 
>>>>>> consequences for the system as a whole. There follows the significant 
>>>>>> task of implementing a CSPRNG correctly and securely for the default 
>>>>>> PRNG.
>>>>> 
>>>>> Theo give a talk a few years ago 
>>>>> <https://www.youtube.com/watch?v=aWmLWx8ut20> on randomness and how these 
>>>>> problems are approached in LibreSSL.
>>>>> 
>>>>> Certainly, we can learn a lot from those like Theo who've dealt with the 
>>>>> issue. I'm not in a position to watch the talk at the moment; can you 
>>>>> summarize what the tl;dr version of it is?
>>>> 
>>>> I saw it three years ago, so I don't remember all the details. The gist is 
>>>> that:
>>>> 
>>>> OpenBSD's random is available from extremely early in the boot process 
>>>> with reasonable entropy
>>>> LibreSSL includes OpenBSD's arc4random, and it's a "good" PRNG (which 
>>>> doesn't actually use ARC4)
>>>> That implementation of arc4random is good because it is fool-proof and it 
>>>> has basically no failure mode
>>>> Stirring is good, having multiple components take random numbers from the 
>>>> same source probably makes results harder to guess too
>>>> Getrandom/getentropy is in all ways better than reading from random devices
>>>> 
>>>> Vigorously agree on all points. Thanks for the summary. 
>>>> 
>> 
>

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

Re: [swift-evolution] [Proposal] Random Unification

Reply via email to