> > and produces correct result on all platforms at the nominal cost (I
> > estimate at most 5% across all platforms) of collecting 32-bit values
> > with 4 byte-loads and accompanying shift and or operations
> > (or couple of rotates and or if compiled with Microsoft C).
> 
> "Well, I estimate the new implementation will do better than 5%."  :)

Well, my original claim was about the cost of assembling of 32-bit
values with 4 single-byte-loads which is not more than 5% relative to
any particular implementation, not comparison between these two
implementations. But in either case I really fail to see how the
proposed implementation will be much faster than the one already present
in the tree. They're practically identical. The only difference is
byte-order of the S-boxes (and therefore order of shifts) and the way
the final round is handled. Well, not to mention already mentioned fact
that the original implementation operates on automatic variables.

> Perhaps, I'll produce some actual numbers using OpenSSL and
> both implementations to prove my case.

Note and respect that OpenSSL is cross-platform toolkit meaning that we
might face and resolve a trade-off.

> > The proposed code is IA-32 specific as IA-32 is the only
> > platform immune
> > to misaligned memory references.
> 
> I don't believe this is true, but I'd be happy to see a
> specific example.

RIJNDAEL_ecb_encrypt(const unsigned char *src,
                     unsigned char *dst,
                     long size,
                     const RIJNDAEL_KEY *key,
                     int encrypt)
{
        if (encrypt)
        {
                while (size >= RIJNDAEL_BLOCK)
                {
                        RIJNDAEL_encrypt((const RIJNDAEL_WORD*) src,
                                         (RIJNDAEL_WORD*) dst,
                                         key);
...

RIJNDAEL_cbc_encrypt(const unsigned char *src,
                     unsigned char *dst,
...
        if (encrypt)
        {
                while (size >= RIJNDAEL_BLOCK)
                {
                        XOR_BLOCK(dst, src, iv);
                        RIJNDAEL_encrypt((const RIJNDAEL_WORD*) dst,
                                         (RIJNDAEL_WORD*) dst,
                                         key);

If either src or dst are misaligned code bombs with bus error on all
platforms, but IA-32. Well, it doesn't bomb on Alpha which handles
misaligned access in trap handler, but as it's trap, the performance
goes below any reasonable value which makes you wish badly it was
aligned.

Andy.
______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       [EMAIL PROTECTED]
Automated List Manager                           [EMAIL PROTECTED]

Reply via email to