On 12/22/2010 02:34 PM, Theo de Raadt wrote:
Which is why I'm wondering what exactly, this 'multi-consumer' design
feature is all about. Is it simply that more userland stuff is pinging
the kernel at unpredictable times resulting in more timestamps feeding
into the central entropy pool? It seems like you could accomplish that
with any syscall. Or is there some other effect being claimed?
Holy cow, you are dense.
"How to tell when someone wants to avoid certain questions..."
I am going to throw out estimates here because (a) it has been a long
time since we tested, and (b) so much can vary machine to machine.
Without a hardware RNG device, a typical i386 desktop machine can
provide (based on interrupt sources) around 1800 bytes of base entropy
to the MD5 thrasher -- per minute.
Finally something concrete to discuss.
That ought to be enough for anybody once it gets going. Of course,
keeping one going is no great claim.
Meanwhile, OpenBSD is consuming about 80 KB of arc4random output per
minute.
Is that supposed to sound like a lot? I mean modern CSPRNGs generate
hundreds of MB per second, per core.
What do you mean exactly by "OpenBSD is consuming"? Are you referring to
the kernel or userland arc4random?
Probably you think these are ridiculous questions.
HMM PERHAPS I'LL GO LOOK AT THE SOURCE CODE AND FIND OUT FOR MYSELF.
Back.
I was wondering how this was supposed to work because other posters had
implied that OpenBSD's "many consumers" of arc4random were (through some
mechanism no one could explain) contributing so much entropy back into
the system pool that this design was categorically better.
So if all 80KB per minute were being consumed by one instance of
lib/libc/crypt/arc4random.c in one user process, it would 'stir' about
once every 20 minutes, adding about two or three nanotimes' worth of
uncertainty. But of course there are many user processes and if you
meant this 80KB were divided among them and it probably happens less
often than that.
The reseeding of the kernel arc4random is on a fixed 10 minute timeout.
Since it's obviously broken to ask a kernel timeout to generate
unpredictable bits from its own clock, the timeout just sets a flag so
it stirs on the next request for data. Ironically, this has the effect
of making the bits from nanotime more predictable the more often that
data is queried from the kernel.
So the "many consumers" of random data are not feeding back significant
entropy by making unpredictable requests. Probably the main contribution
from the typical consumer is the one-time seeding after fork.
The other way in which the "many consumers" theory is said to be
justified is the idea that these consumers are reading varying-sized
segments from the kernel, and so the unpredictable bits of the size
parameter will embody sufficient entropy to thwart RC4-PRNG attacks.
Well? Has anyone looked at the statistical distribution?
How entropic do he be?
Oh wait, I'm not supposed to ask dumb questions.
I'LL GO LOOK AT THE CODE INSTEAD.
Callers of KERN_ARND sysctl:
ld.so/util.c: sizeof(unsigned int)
lib/libc/sys/stack_prot.c: sizeof(long[8])
lib/libc/crypt/arc4random.c: always 128
stdlib/random.c: sizeof(int_32)*(1 or 31) per srandom setup
I'll just venture a guess that the stack_prot and ld.so are likely to be
called in the same way on every process startup. Plus, anything that
uses malloc will be initializing his arc4random pretty quickly too.
Probably most user processes will be consuming three fixed sized blocks
in the same order. Perhaps two or three common apps call stdlib's
srandom, resulting in another call of one of two sizes.
So I'm not seeing much of this "many consumers" entropy coming from
userspace. True, every newly-forked process will generate some.
But a local attacker can watch the process list and probably get a
decent estimate on the nanotime values that the kernel will register.
From the kernel side, it may be another story. There may be zillions of
interrupts happening. But what about their lengths?
Well, almost 3/4 of the call sites request four bytes. The next most
common is probably 16 or 'sizeof(iv)'. Of course, there will be some
variation. But my guess is that the great majority of calls to
arc4random originating within the kernel itself will come from a few
locations and come in even fewer sizes.
Most likely I'm missing something, it needs real measurement.
Possibly the worst-case scenario would be something exactly like an
IPsec VPN deployment. Little or no new process creation, little or no
disk activity. Most interrupts coming from network events which are
visible (if not actively influenced) by an attacker.
If, in fact, an attacker did manage to get some analysis or a state
compromise on the kernel's arc4random, he could [never you mind]. Now,
by knowing the skipped distance he learns something about the number and
type of events that he missed and can put bounds on their nanotimes.
So no, I really don't think this "many consumers" concept is worth beans
when it comes to making up for serious deficiencies in RC4.
How do you convert 1800 bytes of input to 81920 bytes of
output, while giving references out to papers that don't solve this
problem.
LaTeX!
And how do you make it fast. Because all those papers try to solve
the problem by making it SERIOUSLY SLOWER.
Whatever great performance you think you're getting from 8-bit RC4 is
not any better than something modern built on 64- or even 32-bit ops. As
one example, Skein fits in registers and generates solid CSPRNG data at
7 cycles/byte (similar to RC4 but secure). Or if you prefer to compare
apples-to-apples with RC4, you can cut down its round count until it
breaks. The current estimate of its safety factor is 2.9, implying you
could get around 2 cycles/byte (which is consistent with my experiments).
While all this is going on, each userland application is using random
data out of it's own libc arc4random for many purposes, including per
malloc() and free() amongst many others, and are re-seeding their libc
generators from the kernel as required, putting even more pressure on
the kernel.
You seem to be quite concerned about "consuming entropy" from the pool.
Probably this is the big principle that you think I don't understand.
I think you have a 64K bit entropy pool and a pointlessly inefficient
extraction function in an attempt to compensate for the brokenness in
your stretching mechanism, particularly its lack of one-wayness.
You don't know what you are talking about, and you don't seem to have the
ability to wrap your mind around all the parts that are involved.
That is great, I am definitely going to use that one sometime.
I am not reading your mails again.
Well, I shall not trouble this list any further.
Take care,
- Marsh