Do you mean "quadruplicating" the realigner code for each case?
Yes. Switches are commonly done with indirect jumps and they are more expensive. "Quadruplicating" the realigner results in about 700 extra bytes or extra 25% of binary code.
OK, but I'll move the repeating code to a macro for better readability if you wouldn't mind.
Note that in latest version they're more variants in respect to iv! This is to reflect the way iv is treated by hardware.
Some questions:
1) Why don't you like uint8_t/uint32_t/...? When dealing with CPU instructions I prefer to use these "explicit" types instead of "unsigned int".
I suppose I'm a bit old-fashion:-) But the point is that there is no quarantee that inttypes.h is present on all systems we support. At the very least there is no one on Windows...
2) Would you mind if I reformat some parts a little bit? E.g. put { }
after else? I know this way it looks cooler, but... ;-)
+ if ((realign_in_loop=out_misaligned))
+ out = realign;
+ else out = out_arg,
+ realign_in_loop = inp_misaligned;
Sure!
3) Clearing the buffer... + /* Clean the realign buffer if it was used */ + if (realign_in_loop && out_misaligned) { + volatile unsigned long *p=realign; + size_t n=REALIGN_SIZE/sizeof(*p); + while (n--) *p++=0; + }
Using bzero() leads to inline assembler as well. Are those few less instructions in your code worth not using the standard function?
Without explicit "volatalization" of 'realign' ('out' in latest version) compiler is free to optimize it away.
4) While we are at the optimization talks - in the compiled (-O2)
padlock_aes_cipher() there still remain calls to memcpy(). Everything
else is inlined. The trick to convince GCC to inline memcpy():
memcpy((long *)dst, (long *)src, len);
instead of using void* or char* pointers. Should I do it?
Sure. A. ______________________________________________________________________ OpenSSL Project http://www.openssl.org Development Mailing List [EMAIL PROTECTED] Automated List Manager [EMAIL PROTECTED]
