Re: Fast MAC algorithms?

2009-08-02 Thread Zooko Wilcox-O'Hearn
I recommend Poly1305 by DJB or VMAC by Ted Krovetz and Wei Dai.  Both  
are much faster than HMAC and have security proven in terms of an  
underlying block cipher.


VMAC is implemented in the nice Crypto++ library by Wei Dai, Poly1305  
is implemented by DJB and is also in the new nacl library by DJB.


http://cryptopp.com/benchmarks-amd64.html

Says that VMAC(AES)-64 takes 0.6 cycles per byte (although watch out  
for that 3971 cycles to set up key and IV), compared to HMAC-SHA1  
taking 11.2 cycles per byte (after 1218 cycles to set up key and IV).


If you do any measurement comparing Poly1305 to VMAC, please report  
your measurement, at least to me privately if not to the list.  I can  
use that sort of feedback to contribute improvements to the Crypto++  
library.  Thanks!


Regards,

Zooko Wilcox-O'Hearn
---
Tahoe, the Least-Authority Filesystem -- http://allmydata.org
store your data: $10/month -- http://allmydata.com/?tracking=zsig
I am available for work -- http://zooko.com/résumé.html

-
The Cryptography Mailing List
Unsubscribe by sending unsubscribe cryptography to majord...@metzdowd.com


Re: Fast MAC algorithms?

2009-08-02 Thread James A. Donald

Joseph Ashwood wrote:

RC-4 is broken when used as intended.

...

If you take these into consideration, can it be used correctly?


James A. Donald:

Hence tricky


Joseph Ashwood wrote:
By the same argument a Viginere cipher is tricky to use securely, same 
with monoalphabetic and even Ceasar. Not that RC4 is anywhere near the 
brokenness of Viginere, etc, but the same argument can be applied, so 
the argument is flawed.


You cannot use a Viginere cipher securely. You can use an RC4 cipher 
securely:  To use RC4 securely discard the first hundred bytes of 
output, and renegotiate the key every gigabyte.



-
The Cryptography Mailing List
Unsubscribe by sending unsubscribe cryptography to majord...@metzdowd.com


Re: Fast MAC algorithms?

2009-08-02 Thread Joseph Ashwood

--
From: James A. Donald jam...@echeque.com
Subject: Re: Fast MAC algorithms?


Joseph Ashwood wrote:

RC-4 is broken when used as intended.

...

If you take these into consideration, can it be used correctly?


James A. Donald:

Hence tricky


Joseph Ashwood wrote:

By the same argument a Viginere cipher is tricky to use securely, same
with monoalphabetic and even Ceasar. Not that RC4 is anywhere near the
brokenness of Viginere, etc, but the same argument can be applied, so the
argument is flawed.


You cannot use a Viginere cipher securely. You can use an RC4 cipher
securely:  To use RC4 securely discard the first hundred bytes of output,
and renegotiate the key every gigabyte.


The way to use a Viginere securely is to apply an All-Or-Nothing-Transform 
to the plaintext, then encrypt, this results in the attacker entropy of the 
system that is in excess of the size, and therefore a OTP. There are other 
ways, but this method is not significantly more complex than the efforts 
necessary to secure RC4 and results in provable secrecy. It is just tricky 
to use a Vigenere securely.
   Joe 


-
The Cryptography Mailing List
Unsubscribe by sending unsubscribe cryptography to majord...@metzdowd.com


Protocol Construction WAS Re: Fast MAC algorithms?

2009-08-02 Thread Joseph Ashwood

--
From: Ray Dillinger b...@sonic.net
Subject: Re: Fast MAC algorithms?


I mean, I get it that crypto is rarely the weakest link in a secured
application.  Still, why are folk always designing and adopting
cryptographic tools for the next decade or so instead of for the
next few centuries?


Because we have no idea how to do that. If you were to ask 6 months ago we 
would've said AES-256 will last at least a decade, probably 50 years. A few 
years before that we were saying that SHA-1 is a great cryptographic hash. 
Running the math a few years ago I determined that with the trajectory of 
cryptographic research it would've been necessary to create a well over 
1024-bit hash with behaviors that are perfect by todays knowledge just to 
last a human lifetime, since then the trajectory has changed significantly 
and the same exercise today would probably result in 2000+ bits, 
extrapolating the trajectory of the trajectory, the size would be entirely 
unacceptable. So, in short, collectively we have no idea how to make 
something secure for that long.



So far, evidence supports the idea that the stereotypical Soviet
tendency to overdesign might have been a better plan after all,
because the paranoia about future discoveries and breaks that motivated
that overdesign is being regularly proven out.


And that is why Kelsey found an attack on GOST, and why there is a class of 
weak keys. That is the problem, all future attacks are rather by definition 
a surprise.



This is fundamental infrastructure now!  Crypto decisions now
support the very roots of the world's data, and the cost of altering
and reversing them grows ever larger.


By scheduling likely times for upgrades the prices can be assessed better, 
scheduled better, and works far better for business than the OH . OUR 
 IS BROKEN experience that always results from trying to plan for 
longer than a few years at a time. It is far cheaper to build within the 
available knowledge, and design for a few years.




If you can deploy something once, even something that uses three
times as many rounds or key bits as you think now that you need,


Neither of those is a strong indicator of security. AES makes a great 
example, AES-256 has more rounds than AES-128, AES-256 has twice as many key 
bits as AES-128, and AES-256 has more attacks against it than AES-128. An 
increasing number of attack types are immune to the number of rounds, and 
key bits has rarely been a real issue.


There is no way predicting the far future of cryptography, it is hard enough 
to predict the reasonably near future.
   Joe 


-
The Cryptography Mailing List
Unsubscribe by sending unsubscribe cryptography to majord...@metzdowd.com


Re: Fast MAC algorithms?

2009-08-01 Thread Joseph Ashwood

--
From: James A. Donald jam...@echeque.com
Subject: Re: Fast MAC algorithms?


james hughes wrote:


On Jul 27, 2009, at 4:50 AM, James A. Donald wrote:
No one can break arcfour used correctly - unfortunately, it is tricky to 
use it correctly.


RC-4 is broken when used as intended.

...

If you take these into consideration, can it be used correctly?


Hence tricky


By the same argument a Viginere cipher is tricky to use securely, same 
with monoalphabetic and even Ceasar. Not that RC4 is anywhere near the 
brokenness of Viginere, etc, but the same argument can be applied, so the 
argument is flawed.


The question is: What level of heroic effort is acceptable before a cipher 
is considered broken? Is AES-256 still secure?3-DES? Right now, to me 
AES-256 seems to be about the line, it doesn't take significant effort to 
use it securely, and the impact on the security of modern protocols is 
effectively zero, so it doesn't need to be retired, but I wouldn't recommend 
it for most new protocol purposes. RC4 takes excessive heroic efforts to 
avoid the problems, and even teams with highly skilled members have gotten 
it horribly wrong. Generally, using RC4 is foolish at best.
   Joe 


-
The Cryptography Mailing List
Unsubscribe by sending unsubscribe cryptography to majord...@metzdowd.com


Re: Fast MAC algorithms?

2009-07-26 Thread James A. Donald

From: Nicolas Williams nicolas.willi...@sun.com

For example, many people use arcfour in SSHv2 over AES because arcfour
is faster than AES.


Joseph Ashwood wrote:
I would argue that they use it because they are stupid. ARCFOUR should 
have been retired well over a decade ago, it is weak, it meets no 
reasonable security requirements,


No one can break arcfour used correctly - unfortunately, it is tricky to 
use it correctly.


-
The Cryptography Mailing List
Unsubscribe by sending unsubscribe cryptography to majord...@metzdowd.com


Re: Fast MAC algorithms?

2009-07-26 Thread james hughes


On Jul 27, 2009, at 4:50 AM, James A. Donald wrote:


From: Nicolas Williams nicolas.willi...@sun.com
For example, many people use arcfour in SSHv2 over AES because  
arcfour

is faster than AES.


Joseph Ashwood wrote:
I would argue that they use it because they are stupid. ARCFOUR  
should have been retired well over a decade ago, it is weak, it  
meets no reasonable security requirements,


No one can break arcfour used correctly - unfortunately, it is  
tricky to use it correctly.


RC-4 is broken when used as intended. The output has a statistical  
bias and can be distinguished.

http://www.wisdom.weizmann.ac.il/~itsik/RC4/Papers/FluhrerMcgrew.pdf
and there is exceptional bias in the second byte
http://www.wisdom.weizmann.ac.il/~itsik/RC4/Papers/bc_rc4.ps
The latter is the basis for breaking WEP
http://www.wisdom.weizmann.ac.il/~itsik/RC4/Papers/wep_attack.ps
These are not attacks on a reduced algorithm, it is on the full  
algorithm.


If you take these into consideration, can it be used correctly? I  
guess tossing the first few words gets rid of the exceptional bias,  
and maybe change the key often to get rid of the statistical bias? Is  
this what you mean by used correctly?


-
The Cryptography Mailing List
Unsubscribe by sending unsubscribe cryptography to majord...@metzdowd.com


Re: Fast MAC algorithms?

2009-07-24 Thread John Gilmore
 2) If you throw TCP processing in there, unless you are consistantly going to
 have packets on the order of at least 1000 bytes, your crypto algorithm is
 almost _irrelevant_.

This is my experience, too.  And I would add and lots of packets.
The only crypto overhead that really mattered in a real application
was the number of round-trip times it took to negotiate protocols and
keys.  Crypto's CPU time is very very seldom the limiting factor in
real end-user application performance.

 Could the lack of support for TCP offload in Linux have skewed these figures
 somewhat?  It could be that the caveat for the results isn't so much this was
 done ten years ago as this was done with a TCP stack that ignores the
 hardware's advanced capabilities.

I have never seen a network card or chip whose advanced capabilities
included the ability to speed up TCP.  Most such advanced designs
actually ran slower than merely doing TCP in the Linux kernel using an
uncomplicated chip.  I saw a Patent Office procurement of Suns in the
'80s that demanded these slow TCP offload boards (I had to write the
bootstrap code for the project) even though the motherboard came with
an Ethernet chip and software stack that could run TCP *at wire speed*
all day and night -- for free.  The super whizzo board couldn't even
send back-to-back packets, as I recall.  Some government contractor
had added the TCP offload requirement, presumably to inflate the
price that they were adding a percentage markup to.

As a crypto-relevant aside, last year I looked at using the crypto
offload engine in the AMD Geode cpu chip to speed up Linux crypto
operations in the OLPC.  There was even a nice driver for it.
Summary: useless.  It had been designed by somebody who had no idea of
the architecture of modern software.  The crypto engine used DMA for
speed, used physical rather than virtual addresses, and stored the
keys internally in its registers -- so it couldn't work with virtual
memory, and couldn't conveniently be shared between two different
processes.  It was SO much faster to do your crypto by hand in a
shared library in a user process, than to cross into the kernel, copy
the data to be in contiguous memory locations (or manually translate
the addresses and lock down those pages into physical memory), copy
the keys and IVs into the accelerator, do the crypto, copy the results
back into virtual memory, and reschedule the user process.  In typical
applications (which don't always use the same key) you'd need to do
this dance once for every block encrypted, or perhaps if you were
lucky, for every packet.  Even kernel crypto wasn't worth doing
through the thing.  And the software libraries were not only faster,
they were also portable, running on anything, not just one obsolete
chip.

Hardware guys are just jerking off unless they spend a lot of time
with software guys AT THE DESIGN STAGE before they lay out a single
gate.  One stupid design decision can take away all the potential gain.
Every TCP offloader I've seen has had at least one.

John

-
The Cryptography Mailing List
Unsubscribe by sending unsubscribe cryptography to majord...@metzdowd.com


Re: Fast MAC algorithms?

2009-07-24 Thread Peter Gutmann
[I realise this isn't crypto, but it's arguably security-relevant and arguably
 interesting :-)].

James Hughes hugh...@mac.com writes:

TOEs that are implemented in a slow processor in a NIC card have been shown
many times to be ineffective compared to keeping TCP in the fastest CPU
(where it is now).

The problem with statements like this is that they smack of the Linux
religious zealotry against TCP offload support in the kernel, TOE's are bad
because we say they are, and we'll keep asserting this until you go away.  A
decade ago, during the Win2K development, Microsoft were measuring a 1/3
reduction in CPU usage just from TCP checksum offload.  Given the time frame
this was probably on 300MHz PII's, but then again it'd be with late-90s
vintage NICs.  On the other hand I've seen even more impressive figures with
their more recent TCP chimney offload (which just moves more of the NDIS stack
onto the NIC, I think it came out around Server 2003).

Does this mean that MS have figured out (a decade or so ago) how to make TOE
work while the OSS community has been too occupied telling everyone it doesn't
to do anything about it?  There must be some reason for the difference between
the two camps.

Peter.

-
The Cryptography Mailing List
Unsubscribe by sending unsubscribe cryptography to majord...@metzdowd.com


Re: Fast MAC algorithms?

2009-07-24 Thread james hughes


On Jul 24, 2009, at 1:30 PM, Peter Gutmann wrote:

[I realise this isn't crypto, but it's arguably security-relevant  
and arguably

interesting :-)].


As long as we think this is interesting, (although I respectfully  
disagree that there are any inherent security problems with TOE. Maybe  
there are insecure implementation...).



James Hughes hugh...@mac.com writes:

TOEs that are implemented in a slow processor in a NIC card have  
been shown
many times to be ineffective compared to keeping TCP in the fastest  
CPU

(where it is now).


The problem with statements like this is that they smack of the Linux
religious zealotry against TCP offload support in the kernel, TOE's  
are bad
because we say they are, and we'll keep asserting this until you go  
away.


There were a dozen or so protocol offload research projects that the  
US government funded in the 90s. All failed. Is the people who say  
TOE's are bad  because of zealotry or standing on the shoulders of  
the people that ran those projects. At Network Systems, we partnered  
with HT Kung of CMU at the time to move TCP out of a really slow  
Decstation. Result? A accelerator that cost as much as the workstation  
that was faster until the next processor version was available. Yes,  
we could have reduced it to a chip but it wasn't. The take away was  
that improving the software is the gift that keeps on giving. Moore's  
law means you get a faster TCP every time the clock ticks.

http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.41.1138

BTW, I am not a Linux bigot, just someone that got caught up in this  
issue more than a decade ago. I do not agree with your assertion or  
the Wikipedia page that this is linux bigotry. I find that page  
horribly inaccurate and self serving to the TOE manufacturing community.


What I learned from participating in a project that spent $5M of tax  
payer money was that The protocol itself is a small fraction of the  
problem.



A
decade ago, during the Win2K development, Microsoft were measuring a  
1/3
reduction in CPU usage just from TCP checksum offload.  Given the  
time frame
this was probably on 300MHz PII's, but then again it'd be with  
late-90s
vintage NICs.  On the other hand I've seen even more impressive  
figures with
their more recent TCP chimney offload (which just moves more of the  
NDIS stack

onto the NIC, I think it came out around Server 2003).

Does this mean that MS have figured out (a decade or so ago) how to  
make TOE
work while the OSS community has been too occupied telling everyone  
it doesn't
to do anything about it?  There must be some reason for the  
difference between

the two camps.


Offloading features like checksumming, fragmentation/reassembly (aka  
Large Segment Offload), packet categorization, slitting flows to  
different threads, etc. is not TOE.


TOE is offloading of the TCP stack. The thin line that is crossed is  
where is the TCP state kept. If the state is kept in the card, then  
the protocol to get the data reliably to the application is has more  
corner cases (hence complexity) since the IP layer can be lossy and  
the socket layer can not. In all the research, this has always been  
the case.


If there is something windows has not learned could be that processing  
TCP should be simple and quick. Since the source code is not  
available, I don't know if their software falls into the too  
complicated camp or not... In the case of Chimney partial stack  
offload, the state is in both places. Sounds simple straight forward,  
right?


The case of iSCSI where a complete protocol conversion is done (the  
card looks like a SCSI card, but the data goes out over TCP/IP) it is  
a different story (which is also arguably still about solving the OS  
vendor's lack of software agility with hardware), but that is not the  
intent of this discussion.


I fully agree that offloading features that makes the TCP processing  
easier is a good thing.


Back to crypto?


Peter.


Jim

-
The Cryptography Mailing List
Unsubscribe by sending unsubscribe cryptography to majord...@metzdowd.com


Re: Fast MAC algorithms?

2009-07-23 Thread Joseph Ashwood

--
From: Nicolas Williams nicolas.willi...@sun.com
Sent: Tuesday, July 21, 2009 10:43 PM
Subject: Re: Fast MAC algorithms?


But that's not what I'm looking for here.  I'm looking for the fastest
MACs, with extreme security considerations (e.g., warning, warning!
must rekey every 10 minutes)


There's a reason everyone is ignoring that requirement rekeying in any 
modern system is more or less trivial. As an example take AES, rekeying 
every 10 minutes will have a throughput of 99.999% of the original, there 
will be bigger differences depending on whether or not you move the mouse.



being possibly OK, depending on just how
extreme -- the sort of algorithm that one would not make REQUIRED to
implement, but which nonetheless one might use in some environments
simply because it's fast.


I would NEVER recommend it, let me repeat that I would NEVER recommend it, 
but Panama is a higher performing design, IIRC about 8x the speed of the 
good recommendations, but DON'T USE PANAMA. You wanted a bad recommendation, 
Panama is a bad recommendation.


If you want a good recommendation that is faster, Poly1305-AES. You'll get 
some extra speed without compromising security.



For example, many people use arcfour in SSHv2 over AES because arcfour
is faster than AES.


I would argue that they use it because they are stupid. ARCFOUR should have 
been retired well over a decade ago, it is weak, it meets no reasonable 
security requirements, and in most situations it is not actually faster due 
to the cache thrashing it frequently induces due to the large key expansion.



In the crypto world one never designs weak-but-fast algorithms on
purpose, only strong-and-preferably-fast ones.  And when an algorithm is
successfully attacked it's usually deprecated,


The general preference is to permanently retire them. The better algorithms 
are generally at least as fast, that's part of the problem you seem to be 
having, you're not understanding that secure is not the same word as slow, 
in fact everyone has worked very hard in making the secure options at least 
as fast as the insecure.



new
ones tend to be slower because resistance against new attacks tends to
require more computation.


New ones tend to be faster than the old.
New ones are designed with more recent CPUs in mind.
New ones are designed with the best available knowledge on how to build 
security

New ones are simpler by design
New ones make use of everything that has been learned.


I realized this would make my question seem a
bit pointless, but hoped I might get a surprising answer :(


I think the answer surprised you more than you expected. You had hoped for 
some long forgotten extremely fast algorithm, what you've instead learned is 
that the long forgotten algorithms were not only forgotten because of 
security, but that they were eclipsed on speed as well.


I've moved this to the end to finish on the point

The SSHv2 AES-based ciphers ought to be RTI and
default choice, IMO, but that doesn't mean arcfour should not be
available.


I very strongly disagree. One of the fundamental assumptions of creating 
secure protocols is that sooner or later someone will bet their life on your 
work. This isn't an idle overstatement, instead it is an observation.
How many people bet their life and lost because Twitter couldn't protect 
their information in Iran?

How many people bet their life's savings on SSL/TLS?
How many people trusted various options with their complete medical history?
How many people bet their life or freedom on the ability of PGP to protect 
them?


People bet their life on security all the time, it is a part of the job to 
make sure that bet is safe.
   Joe 


-
The Cryptography Mailing List
Unsubscribe by sending unsubscribe cryptography to majord...@metzdowd.com


Re: Fast MAC algorithms?

2009-07-23 Thread Peter Gutmann
mhey...@gmail.com mhey...@gmail.com writes:

2) If you throw TCP processing in there, unless you are consistantly going to
have packets on the order of at least 1000 bytes, your crypto algorithm is
almost _irrelevant_.
[...]
for a Linux 2.2.14 kernel, remember, this was 10 years ago.

Could the lack of support for TCP offload in Linux have skewed these figures
somewhat?  It could be that the caveat for the results isn't so much this was
done ten years ago as this was done with a TCP stack that ignores the
hardware's advanced capabilities.

Peter.

-
The Cryptography Mailing List
Unsubscribe by sending unsubscribe cryptography to majord...@metzdowd.com


Re: Fast MAC algorithms?

2009-07-23 Thread mhey...@gmail.com
On Thu, Jul 23, 2009 at 1:34 AM, Peter Gutmannpgut...@cs.auckland.ac.nz wrote:
 mhey...@gmail.com mhey...@gmail.com writes:

2) If you throw TCP processing in there, unless you are consistantly going to
have packets on the order of at least 1000 bytes, your crypto algorithm is
almost _irrelevant_.
[...]
for a Linux 2.2.14 kernel, remember, this was 10 years ago.

 Could the lack of support for TCP offload in Linux have skewed these figures
 somewhat?  It could be that the caveat for the results isn't so much this was
 done ten years ago as this was done with a TCP stack that ignores the
 hardware's advanced capabilities.

TCP offload would, of course, help reduce CPU load and make crypto
algorithm choice have more of an effect. With our tests, however, to
actually show an effect, we had to use large packet sizes which
reduced the impact of TCP - I know we were using 64K packets for some
tests. Boosting the packet size also affected cycles-per-byte for
NMAC-style algorithms because the outer function gets run less often
for a given amount of data (IPSec processing occurs outbound prior to
fragmentation).

We needed to reduce the impact of TCP because it still remained that
when doing something with the data, the cycles-per-byte of that
processing greatly impacts the percentage of slowdown your MAC
algorithm choice will have.

To throw another monkey wrench into the works, obviously, you may
think But what if I have a low power application, trying to be green,
you know. So I want to use less processor intensive cryptography to
save energy? Well, I sat in the middle of a group of people doing
work for another DARPA project (SensIT) shortly after the ACSA
project. The SensIT project was for low energy wireless sensors in
which we experimented with different key exchange/agreement techniques
in an attempt to economize energy. As a throw-in result, the SensIT
people found it takes 3 orders of magnitude more energy to transmit or
receive data on a per-bit basis than it does to do AES+HMAC-SHA1 (it
came as a surprise to me back then that reception and transmission
take similar amounts of energy). Moral: don't scrimp on crypto to save
energy - at least for wireless, I don't know what it costs to send a
bit down a twisted pair or fiber.

The SensIT final report is available here:
http://www.cs.umbc.edu/courses/graduate/CMSC691A/Spring04/papers/nailabs_report_00-010_final.pdf.

-Michael Heyman

-
The Cryptography Mailing List
Unsubscribe by sending unsubscribe cryptography to majord...@metzdowd.com


Re: Fast MAC algorithms?

2009-07-23 Thread Nicolas Williams
On Thu, Jul 23, 2009 at 05:34:13PM +1200, Peter Gutmann wrote:
 mhey...@gmail.com mhey...@gmail.com writes:
 2) If you throw TCP processing in there, unless you are consistantly going to
 have packets on the order of at least 1000 bytes, your crypto algorithm is
 almost _irrelevant_.
 [...]
 for a Linux 2.2.14 kernel, remember, this was 10 years ago.
 
 Could the lack of support for TCP offload in Linux have skewed these figures
 somewhat?  It could be that the caveat for the results isn't so much this was
 done ten years ago as this was done with a TCP stack that ignores the
 hardware's advanced capabilities.

How much NIC hardware does both, ESP/AH and TCP offload?  My guess: not
much.  A shame, that.

Once you've gotten a packet off the NIC to do ESP/AH processing, you've
lost the opportunity to use TOE.

Nico
-- 

-
The Cryptography Mailing List
Unsubscribe by sending unsubscribe cryptography to majord...@metzdowd.com


Re: Fast MAC algorithms?

2009-07-23 Thread james hughes
Note for Moderator. This is not crypto but TOE being the solution to  
networking performance problems is a perception that is dangerous to  
leave in the crypto community.


On Jul 23, 2009, at 11:45 PM, Nicolas Williams wrote:


On Thu, Jul 23, 2009 at 05:34:13PM +1200, Peter Gutmann wrote:

mhey...@gmail.com mhey...@gmail.com writes:
2) If you throw TCP processing in there, unless you are  
consistantly going to
have packets on the order of at least 1000 bytes, your crypto  
algorithm is

almost _irrelevant_.
[...]
for a Linux 2.2.14 kernel, remember, this was 10 years ago.


Could the lack of support for TCP offload in Linux have skewed  
these figures
somewhat?  It could be that the caveat for the results isn't so  
much this was
done ten years ago as this was done with a TCP stack that ignores  
the

hardware's advanced capabilities.


How much NIC hardware does both, ESP/AH and TCP offload?  My guess:  
not

much.  A shame, that.

Once you've gotten a packet off the NIC to do ESP/AH processing,  
you've

lost the opportunity to use TOE.


IPSEC offload can have value. TOE are far more controversial.

TOEs that are implemented in a slow processor in a NIC card have been  
shown many times to be ineffective compared to keeping TCP in the  
fastest CPU (where it is now). For vendors that can't optimize their  
TCP implementation (because it is just too complicated for then?) TOE  
is a siren call that detracts them from their real problem. Look at  
Van Jacobson post of May 2000 entitled TCP in 30 instructions.

http://www.pdl.cmu.edu/mailinglists/ips/mail/msg00133.html
There was a paper about this, but I am at a loss to find it. One can  
go even farther back to An Analysis of TCP Processing Overhead,   
Clark, Jacobson, Romkey and Salwen in 1989 which states The protocol  
itself is a small fraction of the problem.

http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.75.5741

Back to crypto please.


Nico
--

-
The Cryptography Mailing List
Unsubscribe by sending unsubscribe cryptography to majord...@metzdowd.com


-
The Cryptography Mailing List
Unsubscribe by sending unsubscribe cryptography to majord...@metzdowd.com


Re: Fast MAC algorithms?

2009-07-22 Thread Joseph Ashwood

--
From: Nicolas Williams nicolas.willi...@sun.com
Subject: Fast MAC algorithms?


Which MAC algorithms would you recommend?


I didn't see the primary requirement, you never give a speed requirement. 
OMAC-AES-128 should function around 100MB/sec, HMAC-SHA-512 about the same, 
HMAC-SHA1 about 150MB/sec, HMAC-MD5 250MB/sec. I wouldn't recommend MD5, but 
in many situations it can be acceptable, and none of these make use of 
parallelism to achieve the speeds.
   Joe 


-
The Cryptography Mailing List
Unsubscribe by sending unsubscribe cryptography to majord...@metzdowd.com


Re: Fast MAC algorithms?

2009-07-22 Thread Jack Lloyd
On Tue, Jul 21, 2009 at 07:15:02PM -0500, Nicolas Williams wrote:
 I've an application that is performance sensitive, which can re-key very
 often (say, every 15 minutes, or more often still), and where no MAC is
 accepted after 2 key changes.  In one case the entity generating a MAC
 is also the only entity validating the MAC (but the MAC does go on the
 wire).  I'm interested in any MAC algorithms which are fast, and it
 doesn't matter how strong they are, as long as they meet some reasonable
 lower bound on work factor to forge a MAC or recover the key, say 2^64,
 given current cryptanalysis, plus a comfort factor.
[...]
 Which MAC algorithms would you recommend?

I'm getting the impression that key agility is important here, so one
MAC that comes to mind is CMAC with a block cipher with a fast key
schedule like Serpent. (If for some reason you really wanted to do
something to make secuity auditors squirm you could even cut Serpent
down to 16 rounds which would increase the message processing rate by
about 2x and also speed up the key schedule. This seems like asking
for it to me, though.)

Another plausible answer might be Skein - it directly supports keying
and nonces (so you don't have to take the per-message overhead of the
extra hash as with HMAC), and has very good bulk throughput on 64-bit
CPUs.

-Jack

-
The Cryptography Mailing List
Unsubscribe by sending unsubscribe cryptography to majord...@metzdowd.com


Re: Fast MAC algorithms?

2009-07-22 Thread Nicolas Williams
On Wed, Jul 22, 2009 at 06:49:34AM +0200, Dan Kaminsky wrote:
 Operationally, HMAC-SHA-256 is the gold standard.  There's wonky stuff all
 over the place -- Bernstein's polyaes work appeals to me -- but I wouldn't
 really ship anything but HMAC-SHA-256 at present time.

Oh, I agree in general.  As far as new apps and standards work I'd make
HMAC-SHA-256 or AES, in an AEAD cipher mode, REQUIRED to implement and
the default.

But that's not what I'm looking for here.  I'm looking for the fastest
MACs, with extreme security considerations (e.g., warning, warning!
must rekey every 10 minutes) being possibly OK, depending on just how
extreme -- the sort of algorithm that one would not make REQUIRED to
implement, but which nonetheless one might use in some environments
simply because it's fast.

For example, many people use arcfour in SSHv2 over AES because arcfour
is faster than AES.  The SSHv2 AES-based ciphers ought to be RTI and
default choice, IMO, but that doesn't mean arcfour should not be
available.

In the crypto world one never designs weak-but-fast algorithms on
purpose, only strong-and-preferably-fast ones.  And when an algorithm is
successfully attacked it's usually deprecated, put in the ash heap of
history.  But there is a place for weak-but-fast algos, as long as
they're not too weak.  Any weak-but-fast algos we might have now tend to
be old algos that turned out to be weaker than designed to be, and new
ones tend to be slower because resistance against new attacks tends to
require more computation.  I realized this would make my question seem a
bit pointless, but hoped I might get a surprising answer :(

Nico
-- 

-
The Cryptography Mailing List
Unsubscribe by sending unsubscribe cryptography to majord...@metzdowd.com


Re: Fast MAC algorithms?

2009-07-22 Thread mhey...@gmail.com
On Wed, Jul 22, 2009 at 1:43 AM, Nicolas
Williamsnicolas.willi...@sun.com wrote:

 But that's not what I'm looking for here.  I'm looking for the fastest
 MACs, with extreme security considerations...In the crypto world
 one never designs weak-but-fast algorithms on purpose, only
 strong-and-preferably-fast ones.  And when an algorithm is
 successfully attacked it's usually deprecated, put in the ash heap of
 history.  But there is a place for weak-but-fast algos, as long as
 they're not too weak.

It just so happens that I worked on a DARPA funded project about 10
years ago looking at the effects of any possible strength vs speed
trade off available for different MACing algorithms. We built the
capability into FreeS/Wan's IPSec. Some of our MACs were so weak we
called them Partial MACs (PMACs). PMACs authenticated only randomly
selected pieces of the packet. We figured PMACs were good enough for
video - who cares if Eve can feed you a frame or two of partially
spoofed video as long as you can't get enough to notice.

http://www.isso.sparta.com/documents/acsa_final_report.pdf

The major take aways include:

1) HMAC-SHA1-96 can typically triple the amount of CPU required to
move IP packets through the kernel over a no-crypto option.
HMAC-MD5-96 can double it.

2) If you throw TCP processing in there, unless you are consistantly
going to have packets on the order of at least 1000 bytes, your crypto
algorithm is almost _irrelevant_. TCP costs up to ~1000 cycles per
byte on 10 byte packets, 100 cycles per byte on 100 byte packets, and
only gets down to ~15 cycles per byte at 1000 byte packets. For
reference, HMAC-SHA1-96 takes about 25 cycles per byte for ~1000 byte
packets. These are PentiumII numbers for a Linux 2.2.14 kernel,
remember, this was 10 years ago.

3) If your host is actually going to do something with the data you
receive, it is really really hard to find something that the crypto
algorithm will affect. A coworker of mine struggled for to find a real
world desktop application in which you could actually see a result
(other then some numbers in a log file). Finally he found that viewing
a video remotely in an X-window (thats uncompressed video) would have
occasional drops that becomes noticeable if you pick your video well.
Our video was of a circular radar screen with a rotating update line
(I think it came from a screen saver). With this contrived
application, we could change the MAC algorithm and see more or less
disturbance in the video.

I'd like to emphasize points 2 and 3. You need an application that
either doesn't use TCP or that only uses TCP with MTU sized packets to
even want to care about crypto performance. I don't think the paper
points it out, but all are testing was done with two machines
connected directly to each other. Any out of order processing TCP
needs to do will only decrease the effect a MAC algorithm has. Also if
you want to do _anything_ with the data other than ignore it, it will
only further decrease the effect the MAC algorithm has. We tried
timing FTP transfers, streaming an MPEG, and numerous other things
that I don't remember but all these things had too much overhead to
allow the choice of MAC algorithm to be noticed. 10 years of kernel
network stack development and CPU improvements may have changed the
numbers slightly but I believe you need a really specialized case,
probably including real time requirements on marginal CPUs, before you
need to look at faster MAC algorithms.

Thanks for letting me reminisce about a really fun project (sprinkling
rdtsc around the Linux kernel and getting Steve Kent upset (not
really) at our attempted subversion of IPSec intent - we ended up
doing it the way he wanted even though my way would have been cleaner
grin/).

-Michael Heyman

-
The Cryptography Mailing List
Unsubscribe by sending unsubscribe cryptography to majord...@metzdowd.com