How about merging the last known to work version from you from this morning to the CVS and playing these optimization games later? :-)
The last code only adds couple of sanity checks. E.g. "if (nbytes==0 ...) return 0;" "default: return 0;" Can it be something simple like this?
You reordered most of the assembler parameters
This is preparation for Windows port. Sorry...
and it's now pretty hard to get a reasonable diff between the last working version (the one from cca midnight) and now.
I know, that's why I said "bear with me":-) Do try to replace return 0; in padlock_aes_cipher with return 1;...
I doubt that e.g. the IV loading optimizations would have a noticable speed impact anyway...
IV thing is not an optimization, but a bug fix. I mean I believe that IV was handled incorrectly in my previous version [which should show on chunks larger than REALIGN_SIZE]. Bear with me:-)
What was incorrect on loading IV from ctx to cdata before calling xcrypt and saving it there afterwards? (And the code was much more readable ;-)
Imagine CBC encryption pass, a lot of data, all pointers unaligned. You copy IV to cdata, REALIGN_SIZE bytes of input to aligned buffer and run xcrypt in place. Where does %eax point at exit? At the last chunk of aligned buffer, while original IV copy in cdata should remain unmodified. The last chunk of aligned buffer is current IV! Now we have yet at least another REALIGN_SIZE bytes in same request and at next spin I zap the whole aligned buffer... along with IV! Therefore IV has to be copied elsewhere prior first memcpy in loop. In other words I believe it worked because you never had more than 2*REALIGN_SIZE in one call. A.
______________________________________________________________________
OpenSSL Project http://www.openssl.org
Development Mailing List [EMAIL PROTECTED]
Automated List Manager [EMAIL PROTECTED]
