Re: 5x speedup for AES using SSE5?

2008-08-26 Thread Ilya Levin
Brian Gladman wrote:
 But a fully byte oriented implementation runs at about 140 cycles/byte
 and here the S-Box substitution step is a significant bottleneck.
 ...
 It is also possible that the PPERM instruction could be used to speed up
 the Galois field calculations to produce the S-Box mathematically rather
 than by table lookup. I have tried this in the past but it has not
 proved competitive.  But PPERM looks interesting here as well.

This is where the following may be handy:
http://www.literatecode.com/2007/11/11/aes256/

It is a byte-oriented AES-256 implementation without S-box tables.
Although I doubt it can be speeded up that much.

Regards,
Ilya
-- 
http://www.literatecode.com

-
The Cryptography Mailing List
Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]


Re: 5x speedup for AES using SSE5?

2008-08-25 Thread Brian Gladman
Eric Young wrote:
 Eric Young wrote:
 I've not looked at it enough yet, but currently I'm doing an AES round
 in about 140 cycles a block (call it 13 per round plus overhead) on a
 AMD64, (220e6 bytes/sec on a 2ghz cpu) using normal instructions. 
 Urk, correction, I forgot I've recently upgraded from a 2ghz machine to
 2.5ghz.
 So that should read about 182 cycles per block, and 18 cycles per round.
 I though the number seems strange :-(.  I tent to always quote numbers
 from a 2-3 second run encrypting a 4k buffer, not a machine cycle
 counter over one or two blocks, so I leave myself open to this kind of
 error :-(

The best figure I obtain on an AMD64 system is 11 cycles/byte, which
matches your results (you had me worried for a while with 9 cycles/byte!)

To go 5 times faster than this would mean close to 2 cycles/byte, a
speed that I find hard to believe without hardware acceleration

But a fully byte oriented implementation runs at about 140 cycles/byte
and here the S-Box substitution step is a significant bottleneck.  I too
think the PPERM instruction could be used for this and it seems possible
that this would produce large savings.  So 30 cycles/byte might well be
achievable in this case.

I hence wonder whether this is the comparison that AMD are making.

It is also possible that the PPERM instruction could be used to speed up
the Galois field calculations to produce the S-Box mathematically rather
than by table lookup. I have tried this in the past but it has not
proved competitive.  But PPERM looks interesting here as well.

   Brian Gladman

-
The Cryptography Mailing List
Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]


Re:5x speedup for AES using SSE5?

2008-08-24 Thread Eric Young
Eric Young wrote:
 I've not looked at it enough yet, but currently I'm doing an AES round
 in about 140 cycles a block (call it 13 per round plus overhead) on a
 AMD64, (220e6 bytes/sec on a 2ghz cpu) using normal instructions. 
Urk, correction, I forgot I've recently upgraded from a 2ghz machine to
2.5ghz.
So that should read about 182 cycles per block, and 18 cycles per round.
I though the number seems strange :-(.  I tent to always quote numbers
from a 2-3 second run encrypting a 4k buffer, not a machine cycle
counter over one or two blocks, so I leave myself open to this kind of
error :-(

Still, looking further at the various SSE5 instructions, I'm having
difficultly seeing how
to avoid instruction dependencies when using the SIMD instructions
(specifically using PPERM to implement the sbox).

eric

-
The Cryptography Mailing List
Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]