Hi,

On Wed, 2009-04-01 at 16:02 +0800, Andy Polyakov wrote:
> > Just because the unrolled code is not too long.
> 
> As for non-interleaved loop. Reasoning is that folded loop can be
> inlined in several places to spare few cycles on call overhead. Of
> course this is under premise that it is as fast as unrolled one. Intel
> CPUs used to be very good at small loops, which is why I dared to fold
> the loop. Of course it doesn't have to be the case here and if unrolled
> loop will be proved to be faster, inline code will have to be replaced
> with calls.

Sound reasonable.

> >> - why not encode all aes instructions with .byte?
> > 
> > Just want to encode all aes instructions after some review. Now I think
> > maybe we can define aes instructions as perl function and do encoding
> > via perl.
> 
> It's done at the end of script.

Yes. Thanks.

> > I will test your code on real machine.
> 
> There is real machine? Would you care to perform several tests, so that
> we can sort out what's optimal? I mean the folded vs. unrolled, then I
> wonder if my use of .aligns is excessive in *crypt1... I don't demand
> actual figures [in case you can't disclose them], only if/how
> performance is affected... If yes, we can proceed off-list if so desired.

OK. I will do these tests.

1. folded vs. unrolled
2. .align vs no .align in *crypt1

Any other test to added?

I will test with "openssl speed" and send you the result. I will do the
test tomorrow.

> > BTW: you want me to prepare the patch or you prepare the patch yourself?
> 
> I'll manage it myself. A.

Can you send me the full patch, so I can test it.

Best Regards,
Huang Ying

Attachment: signature.asc
Description: This is a digitally signed message part

Reply via email to