Do note "[when] num [as in memcpy(ovec,ovec+num,8)] is guaranteed to
be positive." Question was can you imagine memcpy implementation that
would fail to handle overlapping regions when source address is
*larger* than destination? Question was *not* if you can imagine
memcpy implementation that would fail to handle arbitrary overlapping
regions.
Yes.
void * memcpy(void * dst, const void * src, size_t len) {
char * d = ((char *) dst) + len;
const char * s = ((const char *) src) + len;
while (len-- > 0) {
*--d = *--s;
}
return dst;
}
This is a fully conformant implementation of memcpy. Not sure why you'd
implement it this way, but it's legal.
Question is not how I implement it, but why ICC would. What would be a
performance reason to implement something similar to this... But
whatever, memmove it is...
See a). Inlining is believed/expected to be faster than call to a
function.
This is not always true. If the inlining causes the code size to bloat
and no longer fit into cache, for example. Also, shared copies of the
function can share branch prediction information.
Well, if one uses designated intrinsic function, compiler has a chance
to evaluate the trade-off and "decide" when it's appropriate to inline
or call a function, while in case of memmove you're bound to call...
It is true in this case, I mention. At least on the x86.
"This case?" Two 32-bit loads + two 32-bit stores [both gcc and icc 8
manage to inline it like this] vs. call to a function to copy 8 bytes?
But as said, whatever, memmove for cfb_enc is it... A.
______________________________________________________________________
OpenSSL Project http://www.openssl.org
Development Mailing List [email protected]
Automated List Manager [EMAIL PROTECTED]