Re: 0.9.8: cfb_enc.c bug? and AES speed on Win64/x64

Andy Polyakov Fri, 08 Jul 2005 02:30:12 -0700

Do note "[when] num [as in memcpy(ovec,ovec+num,8)] is guaranteed tobe positive." Question was can you imagine memcpy implementation thatwould fail to handle overlapping regions when source address is*larger* than destination? Question was *not* if you can imaginememcpy implementation that would fail to handle arbitrary overlappingregions.
Yes.

void * memcpy(void * dst, const void * src, size_t len) {
    char * d = ((char *) dst) + len;
    const char * s = ((const char *) src) + len;

    while (len-- > 0) {
        *--d = *--s;
    }
    return dst;
}
This is a fully conformant implementation of memcpy. Not sure why you'dimplement it this way, but it's legal.


Question is not how I implement it, but why ICC would. What would be a

performance reason to implement something similar to this... Butwhatever, memmove it is...

See a). Inlining is believed/expected to be faster than call to afunction.
This is not always true. If the inlining causes the code size to bloatand no longer fit into cache, for example. Also, shared copies of thefunction can share branch prediction information.


Well, if one uses designated intrinsic function, compiler has a chance
to evaluate the trade-off and "decide" when it's appropriate to inline
or call a function, while in case of memmove you're bound to call...

It is true in this case, I mention.  At least on the x86.


"This case?" Two 32-bit loads + two 32-bit stores [both gcc and icc 8

manage to inline it like this] vs. call to a function to copy 8 bytes?But as said, whatever, memmove for cfb_enc is it... A.


______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       [email protected]
Automated List Manager                           [EMAIL PROTECTED]

Re: 0.9.8: cfb_enc.c bug? and AES speed on Win64/x64

Reply via email to