On Thu, May 09, 2013 at 02:20:01PM +0200, Jilles Tjoelker wrote: > I think architecture-specific memcmp() for i386 and amd64 can still be > beneficial because of the fast unaligned access offered by these CPUs, > which allows comparison of 4 or 8 bytes at a time. SSE2 allows > comparison of 16 bytes at a time but is somewhat harder: not all i386 > CPUs support SSE2, unaligned access is slow on some older CPUs and it > requires assembly so it only uses %xmm8-%xmm15 so rtld does not trash > function parameters (or rtld needs to use non-SSE2 code).
FWIW, rtld is not allowed to modify any registers in the bind code called from the PLT trampoline. The C ABI is not mandated for the functions resolved through the PLT, so our rtld care to not destroy even caller-save or scratch registers, at least on x86*.
pgp5IuCjbaUuU.pgp
Description: PGP signature