Do note "[when] num [as in memcpy(ovec,ovec+num,8)] is guaranteed to be positive." Question was can you imagine memcpy implementation that would fail to handle overlapping regions when source address is *larger* than destination? Question was *not* if you can imagine memcpy implementation that would fail to handle arbitrary overlapping regions.

Yes.

void * memcpy(void * dst, const void * src, size_t len) {
    char * d = ((char *) dst) + len;
    const char * s = ((const char *) src) + len;

    while (len-- > 0) {
        *--d = *--s;
    }
    return dst;
}

This is a fully conformant implementation of memcpy. Not sure why you'd implement it this way, but it's legal.

Question is not how I implement it, but why ICC would. What would be a
performance reason to implement something similar to this... But whatever, memmove it is...

See a). Inlining is believed/expected to be faster than call to a function.

This is not always true. If the inlining causes the code size to bloat and no longer fit into cache, for example. Also, shared copies of the function can share branch prediction information.

Well, if one uses designated intrinsic function, compiler has a chance
to evaluate the trade-off and "decide" when it's appropriate to inline
or call a function, while in case of memmove you're bound to call...

It is true in this case, I mention.  At least on the x86.

"This case?" Two 32-bit loads + two 32-bit stores [both gcc and icc 8
manage to inline it like this] vs. call to a function to copy 8 bytes? But as said, whatever, memmove for cfb_enc is it... A.

______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       [email protected]
Automated List Manager                           [EMAIL PROTECTED]

Reply via email to