Hi,

>> - why full unroll?
> 
> Just because the unrolled code is not too long.

As for non-interleaved loop. Reasoning is that folded loop can be
inlined in several places to spare few cycles on call overhead. Of
course this is under premise that it is as fast as unrolled one. Intel
CPUs used to be very good at small loops, which is why I dared to fold
the loop. Of course it doesn't have to be the case here and if unrolled
loop will be proved to be faster, inline code will have to be replaced
with calls.

>> - why not encode all aes instructions with .byte?
> 
> Just want to encode all aes instructions after some review. Now I think
> maybe we can define aes instructions as perl function and do encoding
> via perl.

It's done at the end of script.

> I will test your code on real machine.

There is real machine? Would you care to perform several tests, so that
we can sort out what's optimal? I mean the folded vs. unrolled, then I
wonder if my use of .aligns is excessive in *crypt1... I don't demand
actual figures [in case you can't disclose them], only if/how
performance is affected... If yes, we can proceed off-list if so desired.

> And at least you can test the
> code with an emulator: SDE,

That's how the code was tested, every code branch was explicitly tested.

> BTW: you want me to prepare the patch or you prepare the patch yourself?

I'll manage it myself. A.
______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       openssl-dev@openssl.org
Automated List Manager                           majord...@openssl.org

Reply via email to