On 12/21/2010 09:26 PM, Theo de Raadt wrote:
Wow.  You really are not reading the same code, are you.

Haha, yeah I have been reading all over the map. My comments are out-of-order too.

BTW, the nanotime in arc4_stir looks like it's redundant anyway since get_random_bytes calls extract_entropy which calls add_timer_randomness.

* An unauthenticated attacker may be able to sample an almost arbitrary
amount of output from your PRNG by making new IPsec connections.

Really?  Prove it.

http://tools.ietf.org/html/rfc2405
   The payload field, as defined in [ESP], is broken down according to
   the following diagram:
      +---------------+---------------+---------------+---------------+
      |                                                               |
      +                   Initialization Vector (IV)                  +
      |                                                               |
      +---------------+---------------+---------------+---------------+
      |                                                               |
      ~              Encrypted Payload (variable length)              ~
      |                                                               |
      +---------------------------------------------------------------+
       1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8

http://code.bsd64.org/cvsweb/openbsd/src/sys/arch/amd64/amd64/aesni.c?rev=1.17

Bring CBC oracle attack countermeasure from r1.32 of cryptosoft.c to
the hardware crypto accelerator land.  This fixes aes-ni, via xcrypt,
glxsb(4), hifn(4), safe(4) and ubsec(4) drivers.

Original commit message by angelos:

Don't keep the last blocksize-bytes of ciphertext for use as the next
plaintext's IV, in CBC mode. Use arc4random() to acquire fresh IVs per
message.

with and ok deraadt, ok markus, djm
@@ -343,7 +340,7 @@ aesni_encdec(struct cryptop *crp, struct
                if (crd->crd_flags & CRD_F_IV_EXPLICIT)
                        bcopy(crd->crd_iv, iv, ivlen);
                else
-                       bcopy(ses->ses_iv, iv, ivlen);
+                       arc4random_buf(iv, ivlen);

                /* Do we need to write the IV */
                if ((crd->crd_flags & CRD_F_IV_PRESENT) == 0) {

Just a hunch. Worth following up on IMHO.

As I
understand it, each now sends 128 bits or so of output as plaintext over
the wire in the IV. :-)

Four days ago, if you were using a particular set of hardware drivers,
then yes.  But the software ipsec stack was fixed for this NINE YEARS
AGO.

What you just said was utterly careless.

*shrug*

* How much of nanotime() is truly unpredictable to an attacker sitting
directly on your local network cable? Maybe the bottom 10 bits, at best?

No.  The upper bits are 'more known'.  The lower bits are 'less known'.

Right, the unpredictable ones are near the bottom.

How did you come to your conclusion?

I wasn't drawing conclusions and tried to indicate that with the use of key words and phrases like "may" and "I don't know...but" and punctuation like '?'.

Except we don't do what you suggest at all.

Excellent!

* Unless you persist entropy across reboots, you are starting from a
known state at boot.

We don't save entropy over boots. You are speaking of one specific way
of solving this, and we don't do that.  You can read the code in /etc/rc
and /etc/rc.shutdown.

At shutdown, we save a block of OUTPUT from the PRNG.
At boot, right after some extremely early system setup, we load that
block in as ENTROPY.

Let me be exact:  We do not save 'raw entropy'.

Oh, ok.

* One reason you would want to XOR entropy into the pool is so that any
nonrandom bits don't obliterate any randomness that they land on top of
(assuming they're independent). Better still, use something like a hash
                                                 ^^^^^^^^^^^^^^^^^^^^^^^^^
function (or the compression function from one).

Moments later, a hash function is indeed used.  It is called MD5.

I'd been reading in the thread something about XORing vs prepending nanotime() and wanted to pitch in an answer this question. I do see now that the random_bytes are coming from MD5. But this was more about where they were going:

XOR it?  Why?

Please provide a citation regarding the benefit of XOR'ing feed data
before passing it into MD5 for the purpose of PRNG folding.

This paper covers the topic pretty thoroughly:
  http://eprint.iacr.org/2010/264.pdf

It somehow manages to be both down-to-Earth and theoretical at the same time (if that makes any sense). It even has a section on "Applications to System RNGs"!

Obviously prepending vs clobbering vs XORing will each result in differing amounts of entropy flowing into the input side. But the output side of MD5 maxes out at 128 bits. So it depends on the input widths involved and other things that may be happening with that same data whether or not it makes a real difference in theory (if that makes any sense).

Note,
this is the first stage PRNG, and that a second stage kernel-use PRNG
is built on top of that the first one, and that a third stage
per-process PRNG is built on top of that.

What does that mean for kernel code like that doing the IV generation? Does that code have its own third stage, or does it feed from the second stage? [NM - I've read the code now]

* If you have to drop KB of output from RC4, you might be better off
with a hash function for stirring. Designers use hash functions because
their one-wayness is a critical property. RC4 wasn't designed to be a
one-way function any more than MD5 was designed to be used in CTR mode
as a stream cipher.

You don't know

Busted! Yeah I was just guessing.  ;-)

There's probably only one guy in the world who really knows what the precise intent was for RC4 and that's Mr. R himself.

Let's check the Google...(first PDF link is usually a good bet)
http://www.google.com/search?q=design+of+rc4

Oooo interesting... it's a Master's thesis from University of Waterloo:
"Design and Analysis of RC4-like Stream Ciphers"
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.70.4039&rep=rep1&type=pdf

This guy sounds like he might know. He even has a home page with a picture of himself on a mountain, like yours!
http://www.math.uwaterloo.ca/~memckagu/uwaterloo/Home.html

But there's no mention of one-way-ness being an explicit design goal for RC4. Since this is a "D & A" paper I'll take that as evidence of absence.

But look at this, this section seems eerily relevant:
3.5.1 Methods of Using IVs
There are, of course, numerous ways of combining two data strings to obtain
a third.

Combining strings! Yeah! Like a nanotime and um..random bytes.

Some of them lead to obvious relationships between the session and
secret keys. A common procedure used with RC4 is to concatenate the
IV with the secret key. Another is to XOR the two strings together.

So they mention concatenation, and there's XOR.

But what about this peculiar method of truncation used in arc4_stir, where the prepended result of nanotime() displaces random material from a fixed-sized buffer?

http://code.bsd64.org/cvsweb/openbsd/src/sys/dev/rnd.c?rev=1.104;content-type=text%2Fx-cvsweb-markup
static void arc4_stir(void) {
        u_int8_t buf[256];
        int len;
        nanotime((struct timespec *) buf);
        len = sizeof(buf) - sizeof(struct timespec);
        get_random_bytes(buf + sizeof (struct timespec), len);
        len += sizeof(struct timespec);

That method isn't mentioned. It seems to be nonstandard.

Of course, with 2048 bits to play with, what could possibly go wrong? (RC4, that's what. It's old and busted.)

Hmm, say, how big is timespec anyway? How many random bytes it it displacing?

I'm not really sure, but I wonder if it might kinda not be so great if a struct timespec ended up needing a different alignment than you get with uint8[]. But I'm just saying that because the C standards would call it 'undefined behavior' (they're so paranoid!). You kernel guys are of course intimately familiar the platform compiler's behavior.

Anyway, back to the UWaterloo D & A RC4:
The simple relationships require a robust key schedule to destroy any
information that could be gleaned about the secret key from the resulting
keystreams. Unfortunately, the key schedule used in RC4 leaks information
when these methods are used, resulting in attacks, described byFluhrer et
al. [5] that expose the secret key.

Oh crap! Attacks that expose the secret key! And to think I didn't know...

A better method is to use a
cryptographic hash function in order to obscure these relationships.
(For example, concatenating the key with the IV and then hashing with
SHA-1 to produce a session key.)

And, in fact, get_random_bytes is implemented as a sequence of MD5s over the main entropy pool.

OK, well here's a spot in D & A paper that mentions one-way functions:
5.4.5 Other Modified RC4 Ciphers
Previously other authors have modified RC4 in an attempt to defeat the
various attacks against it. []
VMPC Stream Cipher
The VMPC stream cipher, described by Zoltak [27] is a straightforward
modification of RC4 that attempts to gain security by replacing the output
function and the j update.  The function used is the so called VMPC one
way function.

It's a variant of RC4 that's considered notable specifically because it adds one-way-ness to the key input of the cipher.

So I think I'll stick by my story that "RC4 wasn't designed to be a one-way function".

* Here's a good summary of attacks:
http://www.schneier.com/paper-prngs.pdf

Yes, a wonderful paper about designing a PRNG for ONE consumer.

Really? I thought it was a survey of attacks. I'll have to read it again.

If you had 1000 consumers of PRNG data, would you design a PRNG that way?

That is the question which I think OpenBSD asked and proceeded to solve.

Whatever you mean by "that way", if the generator satisfies the defined properties and gave a healthy margin all round, I would say "yes".

That said, some designs could be expected to parallelize better than others.

* This is a thoroughly researched area of crypto. Perhaps you might
consider using a standard design? (If for no other reason than to save
endless discussions?)

I disagree strongly.  The field of research is a total disaster because
everyone tries to use ONE PRNG PER PURPOSE.

We don't do that.  We data-slice one PRNG with ALL CONSUMERS.

I don't know what that means.

Let's say I could sample the output of the RNG in every process and from every network device in the system. As much as I wanted. How could I tell the difference between "one prng per purpose" and "data-slicing one prng with all consumers"?

Maybe performance?

Looks to me like you have:
"entropy pool" of theoretically 64K bits max
"kernel arc4random" @ 1684 bits shared system-wide under single mutex
"per-process arc4random" @ 1684 bits per process

It currently costs 2064 MD5 block evaluations when arc4_stir seeds from get_random_bytes, mostly over the same data repeatedly. But 1684/128 = 13.2, so that's 156 times more expensive than necessary!

If you have that kind of capacity to spend, I can think of some better places to spend it.

Everyone else is using standard designs, and as a result they do not
provide enough output data, and therefore, they make decisions that
consider PRNG output EXPENSIVE, and then use it less.

Well that's not the fault of a good algorithm or a good analysis. (Not that all of them are good).

I remember, you guys were the first on the block to have decent TCP ISNs, randomized PIDs, and other cool stuff too. Other vendors make their own decisions. If they don't think it's worth 256 bytes per process overhead, that's their business, but they ought not complain about the expense of doing it the hard way either.

My desktop machine is generating and consuming PRNG in many many
layers of the kernel PER PACKET, heck, per keystroke, while other
operating systems continue to use sequential IP id's.

Yeah they suck.

Please go study what we have done before talking about it.

Oh come on. Don't you think that's putting the horse before the cart a little bit?

Actually, I've been looking at the code as I composed this reply.

Regards,

- Marsh

Reply via email to