Brian Gladman wrote:
> But a fully byte oriented implementation runs at about 140 cycles/byte
> and here the S-Box substitution step is a significant bottleneck.
> ...
> It is also possible that the PPERM instruction could be used to speed up
> the Galois field calculations to produce the S-Box mat
Eric Young wrote:
> Eric Young wrote:
>> I've not looked at it enough yet, but currently I'm doing an AES round
>> in about 140 cycles a block (call it 13 per round plus overhead) on a
>> AMD64, (220e6 bytes/sec on a 2ghz cpu) using normal instructions.
> Urk, correction, I forgot I've recently up
Eric Young wrote:
> I've not looked at it enough yet, but currently I'm doing an AES round
> in about 140 cycles a block (call it 13 per round plus overhead) on a
> AMD64, (220e6 bytes/sec on a 2ghz cpu) using normal instructions.
Urk, correction, I forgot I've recently upgraded from a 2ghz machin
Paul Crowley wrote, On 24/8/08 1:00 AM:
http://www.ddj.com/hpc-high-performance-computing/201803067
[...] However, glancing through the SSE5 specification, I
can't see at all how such a dramatic speedup might be achieved
A commenter on slashdot hinted at the vector permutation instructions,
s