Re: Optimisation Considered Harmful

2005-06-25 Thread D. J. Bernstein
Here's an amusing example of optimization: On the PowerPC 7450 (G4e),
integer multiplication is faster by one cycle if the second operand is
between -131072 and 131071. Ever use multiplication in cryptography?

Jerrold Leichter writes:
 There are only a couple of roads forward:
   - Develop algorithms that offer reasonable performance even if
   implemented in unoptimized ways.

We already have some secret-key ciphers that naturally run in constant
time on current CPUs. One example is my own Salsa20, which is a quite
conservative design but still faster than AES. Some other examples are
Phelix and good old TEA.

Furthermore, most CPUs have constant-time floating-point multiplication.
Floating-point multiplication usually turns out to be the fastest way to
do integer arithmetic in RSA etc., although it takes some effort to use.

The quick summary is that we _can_ have high-speed constant-time
cryptography. The algorithms are already there---although they need to
be distinguished from the impostors such as Rijndael. The implementation
techniques are already there---although they need to be more widely
understood and used.

 I can't recall ever seeing a benchmark that reported the
 variance of timing across instances of the loop.

For years now I've been reporting multiple individual timings rather
than averages. See, e.g., http://cr.yp.to/mac/poly1305-20041101.pdf,
Appendix B; those are the AES timings that prompted me to start
investigating cache-timing attacks.

(Subsequent versions of the poly1305 paper report even more timing
information but, for space reasons, have to compress the information
into small graphs. Big tables are on the web.)

---D. J. Bernstein, Associate Professor, Department of Mathematics,
Statistics, and Computer Science, University of Illinois at Chicago

-
The Cryptography Mailing List
Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]


Re: Optimisation Considered Harmful

2005-06-25 Thread Jerrold Leichter
| Suppose you have something that is inadvertently an
| oracle - it encrypts stuff from many different users
| preparatory to sending it out over the internet, and
| makes no effort to strongly authenticate a user.
| 
| Have it encrypt stuff into a buffer, and on a timer
| event, send out the buffer
| The problem is with edges:
| 
| Suppose the timer goes off every 10ms.
| 
| You have an operation that takes either 5ms or 15ms, depending on
| whether a chosen bit of the key is 1 or 0.
| 
| Whether or not a given time slot is occupied with results will emit
| whether the bit was 1 or 0.
| 
| Now, suppose your timer goes off every 200ms.  No problem, right?
| 
| At time=190ms, you force an encryption.  If it's done by the time=200ms
| deadline, you know
thus bringing us back to an observation that goes back maybe 30 years:  Timing
channels are usually impractical to eliminate, but you can lower the bit rate
(or decrease the S/N ratio, which comes down to the same thing).  For the case
at hand, we've lowered the bit rate by a factor of 20.

If you're willing to change the protocol a bit, you can do better:  Never send
a block whose encryption crossed the boundary.  Instead, send a special ignore
me block.  (Obviously, the ignore me marker is *inside* the encrypted
envelope.)  Depending on the nature of the protocol - is the size of a response
invarient, or is the number of exchanges invarient? - you then continue either
by sending the real block, or by the sender returning an ignore me block
which you reply to with the encrypted block that crossed the boundary.  This
still doesn't completely remove the timing channel - there are differences in
the data stream depending on whether you cross the boundary - but they are
much subtler, so again you've decreased the bit rate.  (It's not clear how to
quantify in this case, though.)

-- Jerry

-
The Cryptography Mailing List
Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]


Re: Optimisation Considered Harmful

2005-06-25 Thread bear


On Thu, 23 Jun 2005, Jerrold Leichter wrote:

 Consider what it means to create an optimizing compiler (or some
 kind of opti- mization in hardware - the issues are the same, but
 I'll talk in terms of a compiler for definiteness.)  The input is
 source code; the output is a bunch of instructions.  A compiler's
 transformation is correct if it's semantics- preserving:

Yep.  As a sometime compiler weenie, I'll give literal answers
to your next few questions.

 The big issue in compiler optimization - and even more so in some
 hardware optimization - is defining exactly what semantics has to be
 preserved.  For example, must computations be done even if the
 compiler can determine that they cannot affect the output?

No.  There's no language higher than assembly that requires that.  A
lot of compilers will do the redundant computations anyway because
the authors of those compilers considered that it would be too much
work to prove what can be eliminated, and in imperative languages
like C there's often little benefit in doing so.  But there are
transformations and techniques you can do (autoconverting to laziest
semantics allowed by the language spec) to absolutely determine what
can be eliminated, and emit code that *only* evaluates that which
does affect the output.  And increasingly, compilers are doing them.

 Can the
 rules of algebra be used to rearrange expressions (possibly breaking
 carefully crafted error management strategies, since floating point
 arithmetic doesn't actually obey the rules of algebra)?

If it changes the output, it's considered an error for C compilers to
do this.  That said, it's considered a minor error, and since a lot of
people are in a great hurry to get wrong answers, GCC will knowingly
and willingly commit this error for you (as an extension to the
language) if you give it a -O option that's too high.  And several
other compilers (including Borland's and Microsoft's) have committed
this error in the past without giving anyone the ability to turn it
off, or by default with a --fpstrict option to turn it off.

That said, in languages other than C you have to look to the language
spec to see whether this is allowed or not.  If the spec is silent on
the topic then rearranging in a way that changes output is still
usually considered a bug, but it becomes open to debate whether it is
an actual bug or not.

Some language specs (including Common Lisp, IIRC) and several
single-implementation languages (including Mathematica, IIRC) allow
arbitrary rearrangement of floating-point operations, provided that
the compiler can prove that *at least* as much precision is preserved
as would be gained by doing it in the order it appears in the source.
This is explicitly allowed to change output, provided the compiler can
prove that the changed output is a closer approximation to the
algebraic solution.  Even in languages where this is allowed, many
implementors consider it tasteless and try to avoid doing it,
conforming thier compilers' behavior to a tenet of the principle of
least surprise.

  Do writes
 to two variables written in order in the source code have to occur
 in that order in the object code?

In C, the answer depends on whether the variables are declared
volatile.  If they are not declared volatile, then the writes need not
appear in any particular order, and if the compiler can prove the
values are not used, need not appear at all.  In languages without
volatile variables, the answer is usually no.

 If two writes are issued in order
 to the hardware, do they have to hit memory in that order?

Uh, I think the short answer is no, but you'll never be able to
tell the difference in normal circumstances.  It's common for a
superscalar chip to buffer memory writes in its L1 Cache, and
write a cache line to memory at a go (which may reorder the writes
to memory that's off the chip).  But no software running on the
same chip will be able to tell it by reading those addresses,
because it will snag the written values (even though they
haven't hit phyiscal memory yet) out of the L1 cache.

Where this can break down with the possibility that you'd be able to
tell it's breaking down, is volatile variables in multiprocessor
machines.  But these are normally protected by mutexes, the
implementation of which must trigger explicit L1 cache flushes on
multiprocessor machines.

 An understanding of what semantics has to be preserved, and what
 semantics is side-effect and can be tossed to gain performance,
 has gradually emerged in the hardware and software communities.
 There have been, and continue to be, missteps along the way.  Some
 of the weaker memory consistency models might have helped the
 hardware guys, but were just too hard for the software guys to deal
 with.  Newbies to multi-threaded programming are caught by the
 not-so obvious memory access semantics present even at the language
 level in all common programming languages.  (Java tried to pin this
 down exactly and got it 

Re: Optimisation Considered Harmful

2005-06-25 Thread James A. Donald
--
James A. Donald:
  Suppose you have something that is inadvertently an 
  oracle - it encrypts stuff from many different users 
  preparatory to sending it out over the internet, and 
  makes no effort to strongly authenticate a user.
 
  Have it encrypt stuff into a buffer, and on a timer 
  event, send out the buffer.
 
  Your code is now of course multithreaded - very easy 
  to get multithreading bugs that never show up during 
  testing, but non deterministically show up in actual 
  use.

On 24 Jun 2005 at 12:25, Dan Kaminsky wrote:
 The problem is with edges:

 Now, suppose your timer goes off every 200ms.  No 
 problem, right?

 At time=190ms, you force an encryption.  If it's done 
 by the time=200ms deadline, you know.

It should have been needless to say, that at the end of 
each time frame, the oracle only starts sending out 
stuff encrypted in response to data received at least n 
time frames previously, where n is a small positive 
number, possibly one.

A time frame is longer than the difference between the 
quickest and slowest encryption of a block.  n time 
frames is longer than the slowest encryption of a block. 

--digsig
 James A. Donald
 6YeGpsZR+nOTh/cGwvITnSR3TdzclVpR0+pr3YYQdkG
 JdXC3IuQNnYvM2SrAOIY2iLJyhKf21IR191yeebK
 4FIl5EvQ0dseZCj2m2/NsQANv7tID98AAQ+pJMARn



-
The Cryptography Mailing List
Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]


Re: Optimisation Considered Harmful

2005-06-24 Thread Victor Duchovni
On Thu, Jun 23, 2005 at 07:36:38AM -0400, Jerrold Leichter wrote:

   - Develop algorithms that offer reasonable performance even if
   implemented in unoptimized ways.  This will be difficult
   to maintain in the face of ever-increasing hardware optimiza-
   tions that you can't just turn off by not using -O.
 
   - Live with less performance and hope that raw hardware speeds will
   catch up.
 
   - Use specialized hardware, designed not to leak side-channel
   information.
 
   - ?

- Find reasonably efficient masking strategies, that assume
that side-channel attacks are here to stay, and randomly choose
one of many isomorphic ways to perform the computation. The
masking would have to eliminate key/data correlation from all
observables other than the final output.


-- 

 /\ ASCII RIBBON  NOTICE: If received in error,
 \ / CAMPAIGN Victor Duchovni  please destroy and notify
  X AGAINST   IT Security, sender. Sender does not waive
 / \ HTML MAILMorgan Stanley   confidentiality or privilege,
   and use is prohibited.

-
The Cryptography Mailing List
Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]


Re: Optimisation Considered Harmful

2005-06-24 Thread Ben Laurie

Victor Duchovni wrote:

On Thu, Jun 23, 2005 at 07:36:38AM -0400, Jerrold Leichter wrote:



- Develop algorithms that offer reasonable performance even if
implemented in unoptimized ways.  This will be difficult
to maintain in the face of ever-increasing hardware optimiza-
tions that you can't just turn off by not using -O.

- Live with less performance and hope that raw hardware speeds will
catch up.

- Use specialized hardware, designed not to leak side-channel
information.

- ?



- Find reasonably efficient masking strategies, that assume
that side-channel attacks are here to stay, and randomly choose
one of many isomorphic ways to perform the computation. The
masking would have to eliminate key/data correlation from all
observables other than the final output.


If it does that, why do you want to choose one of many? Surely a single 
one will do?


--
ApacheCon Europe   http://www.apachecon.com/

http://www.apache-ssl.org/ben.html   http://www.thebunker.net/

There is no limit to what a man can do or how far he can go if he
doesn't mind who gets the credit. - Robert Woodruff

-
The Cryptography Mailing List
Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]


Re: Optimisation Considered Harmful

2005-06-24 Thread Victor Duchovni
On Fri, Jun 24, 2005 at 10:00:55AM +0100, Ben Laurie wrote:

  - Find reasonably efficient masking strategies, that assume
  that side-channel attacks are here to stay, and randomly choose
  one of many isomorphic ways to perform the computation. The
  masking would have to eliminate key/data correlation from all
  observables other than the final output.
 
 If it does that, why do you want to choose one of many? Surely a single 
 one will do?
 

The idea is that each choice leaks side-channel information about its
algorithm, but the attacker does not know which one was chosen. And,
repeated observations do not on average (over all algorithms) show
correlation between the key or data and side-channel information (other
than the final output). Is this possible? There is a paper that claims
no correlation with any any single intermediate result, is that strong
enough?

-- 

 /\ ASCII RIBBON  NOTICE: If received in error,
 \ / CAMPAIGN Victor Duchovni  please destroy and notify
  X AGAINST   IT Security, sender. Sender does not waive
 / \ HTML MAILMorgan Stanley   confidentiality or privilege,
   and use is prohibited.

-
The Cryptography Mailing List
Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]


Re: Optimisation Considered Harmful

2005-06-24 Thread James A. Donald
--
On 23 Jun 2005 at 0:50, Ben Laurie wrote:

 A brief altercation this evening with CERT over the
 recent hyperthread caching issues has brought
 something that's been simmering at the back of my
 brain to the forefront.

 The recent hyperthread/cache key recovery trick,
 followed by DJB's related (IMO) symmetric key
 recovery, and preceded by the RSA timing attacks
 (Boneh et al?) are all examples of attacks on the same
 thing: optimisation.

 The problem is that if different paths through your
 code expose different availability of optimisation,
 then there's a timing attack available. I predict, for
 example, that someone will find a way to attack
 something using pipeline breaks on Pentium
 processors[1].

 How do we fix this? Well, its easy to say: we make
 everything equally crap - we make sure that all
 operations are as slow as the slowest possible variant
 can be. However, on a modern processor, this is _very_
 hard to do.

Suppose you have something that is inadvertently an
oracle - it encrypts stuff from many different users
preparatory to sending it out over the internet, and
makes no effort to strongly authenticate a user.

Have it encrypt stuff into a buffer, and on a timer
event, send out the buffer.

Your code is now of course multithreaded - very easy to
get multithreading bugs that never show up during
testing, but non deterministically show up in actual
use. 

--digsig
 James A. Donald
 6YeGpsZR+nOTh/cGwvITnSR3TdzclVpR0+pr3YYQdkG
 fWkmIPqr+sQN9GW27vahB3Bc9ulLdzbGrPKEjXFL
 4nPDlKsQgDKH6LEnS3M7ECcBByW0lH5o7CUzo2UYB


-
The Cryptography Mailing List
Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]


Re: Optimisation Considered Harmful

2005-06-24 Thread Dan Kaminsky

Suppose you have something that is inadvertently an
oracle - it encrypts stuff from many different users
preparatory to sending it out over the internet, and
makes no effort to strongly authenticate a user.

Have it encrypt stuff into a buffer, and on a timer
event, send out the buffer.

Your code is now of course multithreaded - very easy to
get multithreading bugs that never show up during
testing, but non deterministically show up in actual
use. 
  

The problem is with edges:

Suppose the timer goes off every 10ms.

You have an operation that takes either 5ms or 15ms, depending on
whether a chosen bit of the key is 1 or 0.

Whether or not a given time slot is occupied with results will emit
whether the bit was 1 or 0.

Now, suppose your timer goes off every 200ms.  No problem, right?

At time=190ms, you force an encryption.  If it's done by the time=200ms
deadline, you know.

Things get trickier when there's random noise in the timer, and it
matters whether the distribution of 1's and 0's is equal or not.  But
this is fundamentally a difficult problem to handle.

--Dan


-
The Cryptography Mailing List
Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]


Re: Optimisation Considered Harmful

2005-06-23 Thread Jerrold Leichter
| A brief altercation this evening with CERT over the recent hyperthread caching
| issues has brought something that's been simmering at the back of my brain to
| the forefront.
| 
| The recent hyperthread/cache key recovery trick, followed by DJB's related
| (IMO) symmetric key recovery, and preceded by the RSA timing attacks (Boneh et
| al?) are all examples of attacks on the same thing: optimisation.
| 
| The problem is that if different paths through your code expose different
| availability of optimisation, then there's a timing attack available. I
| predict, for example, that someone will find a way to attack something using
| pipeline breaks on Pentium processors[1].
|
| How do we fix this? Well, its easy to say: we make everything equally crap -
| we make sure that all operations are as slow as the slowest possible variant
| can be. However, on a modern processor, this is _very_ hard to do.
| 
| Could it be that for safe crypto we should be using Z80s?
This is an excellent point, as it reveals a deep and significant parallel
between cryptography and compiler/hardware optimization.

Consider what it means to create an optimizing compiler (or some kind of opti- 
mization in hardware - the issues are the same, but I'll talk in terms of a 
compiler for definiteness.)  The input is source code; the output is a bunch 
of instructions.  A compiler's transformation is correct if it's semantics- 
preserving:  The source has some meaning, and the object code correctly 
represents that meaning.  There are an infinite number of possible object 
codes that preserve the input semantics.  Some are better than others with 
respect to some objective function, say size or speed.  An optimizing compiler 
simply chooses a better object code than some reference choice.

The big issue in compiler optimization - and even more so in some hardware
optimization - is defining exactly what semantics has to be preserved.  For
example, must computations be done even if the compiler can determine that
they cannot affect the output?  Can the rules of algebra be used to rearrange
expressions (possibly breaking carefully crafted error management strategies,
since floating point arithmetic doesn't actually obey the rules of algebra)?
Do writes to two variables written in order in the source code have to occur
in that order in the object code?  If two writes are issued in order to the
hardware, do they have to hit memory in that order?

An understanding of what semantics has to be preserved, and what semantics is
side-effect and can be tossed to gain performance, has gradually emerged in
the hardware and software communities.  There have been, and continue to be,
missteps along the way.  Some of the weaker memory consistency models might
have helped the hardware guys, but were just too hard for the software guys
to deal with.  Newbies to multi-threaded programming are caught by the not-so-
obvious memory access semantics present even at the language level in all
common programming languages.  (Java tried to pin this down exactly and got it
completely wrong for several tries.)

Enter cryptographic algorithms.  On their face, these are simple mathematical
transformations.  You have to really abuse the implementations (e.g., having
multiple side-effect-producing operations in one statement) to get into area
where a programmer or hardware developer's warning bell would sound watch the
semantics.  And, in fact, from the point of view of input/output transforma-
tions, there really are no semantic issues.  The problem is that these side-
channel attacks broaden the meaning of the program to something that has
never been considered in previous work that I know of.  (The closest you are
likely to find is in complaints by real-time programmers that modern machines
give you no way to determine how long an instruction sequence will really
take:  You might take a page fault, or a cache miss, or who knows what along
the way, and in some real-time code, you have to *know* that that won't
happen.  Such code really is sometimes run only on machines like Z80's!

What can be done?  Well, the first thing that's clearly needed is a more
complete specification of the semantics of cryptographic algorithms.  Just
the input/output transformation - which is all we write down in most analyses
- is insufficient.  We sometimes state things informally, almost in passing -
as in the comments on AES that table accesses take constant time.  We
certainly assume things like the time to add two numbers is independent of
the number of carries - which is probably true on machines today, but may
actually have been false at one time.  Without a way to write down what
matters, we have no way to judge whether a particular compilation/hardware
approach is safe.  There's some work on abstract program interpretation that
might help here (though it's mainly aimed in other directions).

Ultimately, performance is likely to suffer.  Software and hardware optimiza-
tions are 

Optimisation Considered Harmful

2005-06-22 Thread Ben Laurie
A brief altercation this evening with CERT over the recent hyperthread 
caching issues has brought something that's been simmering at the back 
of my brain to the forefront.


The recent hyperthread/cache key recovery trick, followed by DJB's 
related (IMO) symmetric key recovery, and preceded by the RSA timing 
attacks (Boneh et al?) are all examples of attacks on the same thing: 
optimisation.


The problem is that if different paths through your code expose 
different availability of optimisation, then there's a timing attack 
available. I predict, for example, that someone will find a way to 
attack something using pipeline breaks on Pentium processors[1].


How do we fix this? Well, its easy to say: we make everything equally 
crap - we make sure that all operations are as slow as the slowest 
possible variant can be. However, on a modern processor, this is _very_ 
hard to do.


Could it be that for safe crypto we should be using Z80s?

Cheers,

Ben.

[1] For those who don't know, even old Pentia have several different 
processing units internally, which run in parallel. They can even tell 
when instructions in one's pipeline conflict with another's (e.g. one 
uses a register as input that another is going to change, but hasn't 
yet), and delay processing appropriately. However, sometimes they can't 
work it out (this is, of course, another optimisation, this time for 
transistor count) and so they just throw their hands up, stop all the 
pipelines and start again. This causes a _substantial_ delay, easily 
measurable - I know of loops of tens of instructions that go twice as 
fast if apparently redundant - but pipeline-intelligence-informing - 
instructions are inserted. Of course, the PowerPCs with their 
speculative execution are probably even more fun in this respect (do 
they still do that?).


--
ApacheCon Europe   http://www.apachecon.com/

http://www.apache-ssl.org/ben.html   http://www.thebunker.net/

There is no limit to what a man can do or how far he can go if he
doesn't mind who gets the credit. - Robert Woodruff

-
The Cryptography Mailing List
Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]