Hi, On Wed, 2009-04-01 at 16:02 +0800, Andy Polyakov wrote: > > Just because the unrolled code is not too long. > > As for non-interleaved loop. Reasoning is that folded loop can be > inlined in several places to spare few cycles on call overhead. Of > course this is under premise that it is as fast as unrolled one. Intel > CPUs used to be very good at small loops, which is why I dared to fold > the loop. Of course it doesn't have to be the case here and if unrolled > loop will be proved to be faster, inline code will have to be replaced > with calls.
Sound reasonable. > >> - why not encode all aes instructions with .byte? > > > > Just want to encode all aes instructions after some review. Now I think > > maybe we can define aes instructions as perl function and do encoding > > via perl. > > It's done at the end of script. Yes. Thanks. > > I will test your code on real machine. > > There is real machine? Would you care to perform several tests, so that > we can sort out what's optimal? I mean the folded vs. unrolled, then I > wonder if my use of .aligns is excessive in *crypt1... I don't demand > actual figures [in case you can't disclose them], only if/how > performance is affected... If yes, we can proceed off-list if so desired. OK. I will do these tests. 1. folded vs. unrolled 2. .align vs no .align in *crypt1 Any other test to added? I will test with "openssl speed" and send you the result. I will do the test tomorrow. > > BTW: you want me to prepare the patch or you prepare the patch yourself? > > I'll manage it myself. A. Can you send me the full patch, so I can test it. Best Regards, Huang Ying
signature.asc
Description: This is a digitally signed message part