Well, if you had say a single thread collecting data to feed an entropy
pool, once an attacker syncronized on that, they'd win. Not sure that's
possible, but it's probably better for security if this is done inline by
each thread as needed. (Particularly when you consider the real OpenSSL
usage scenarios - web servers with a lot of running threads - good luck
making a timing attack work in that use case).

There's one more point. The upper bits of those registers are easier to
guess than the lower, again the 'fix' is obvious, what's more difficult is
knowing which of the lower bits are actually changing.

i.e. P4 the lower 4 bits are effectively 'stuck' as every instruction is a
multiple of 16 clocks long, quite a few processors have quirks here.

Pete




From:   Andy Polyakov <ap...@openssl.org>
To:     openssl-dev@openssl.org
Date:   23/01/2012 03:38
Subject:        Re: OS-independent entropy source?
Sent by:        owner-openssl-...@openssl.org



> HT processors are a nightmare for security yes :).

I've attempted the experiment even on hyper-threading P4. No anomalies
in sense that it looks pretty much like another P4. Well, one thread
appears to get more interrupts, while spikes tend to be higher on the
other thread. But when it comes to "fine print", i.e. variations between
interrupts, there is no essential difference and cross-correlation looks
essentially the same as on real multi-core. No maximum at zero lag
though... On the second thought why would there be difference, when
every sample takes several *hundred* clock cycles to complete?
Hyper-threading operates at single clock cycle resolution, not hundreds,
right?

> You are assuming the target software is collecting data continuously as
> fast as it can - which I agree, simply turns it into the designated
> victim :). Don't do that - the data rate it high enough you can sample
> on demand and you can afford some delay between samples.

But data will have to be collected in "bursts" and not exactly short
ones, e.g. ~700 samples or 300 microseconds are suggested on the page,
initial calibration can be tens milliseconds... Would it be appropriate
to say that these are not long enough to detect and synchronize on?
[Naturally provided that detection and synchronization can give
adversary the edge.] Assuming that that collection is continuous is
simply first approximation on the problem...

> And make sure your sample collection code is branch free - you can still
> attack it via the cache, but it's a lot harder to know exactly where the
> victim is and your attack code has to be able to get that exactly right.

Loop bodies are branch-free on all platforms. Though I don't think it
matters a lot, because, once again, sample is several *hundred* cycles,
much higher than [mis-]branch penalties.
______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       openssl-dev@openssl.org
Automated List Manager                           majord...@openssl.org



______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       openssl-dev@openssl.org
Automated List Manager                           majord...@openssl.org

Reply via email to