On Wed, 15 Nov 2017, Mark Millard wrote:

Bruce Evans brde at optusnet.com.au wrote on
Tue Nov 14 12:41:50 UTC 2017 :

- memcpy.3.  It was already documented in the BUGS section that memcpy() is
   implemented as bcopy() and therefore the "strings" "may overlap" [then
   portability considerations due to this].  This is buggier than before:
   - old bug: only the MI implementation of memcpy() clearly implements
     memcpy() as bcopy().  The amd64, i386 and arm MD implementations
     do the same.

I was confused between arm and mips and only fixed this in some places.
It is mips that does the same as amd64 and i386.  arm has different MD code,
and the others have no md code.

head/sys/arm/arm/support.S ( -r283366 ) has

394     ENTRY(bcopy)
395             /* switch the source and destination registers */
396             eor     r0, r1, r0
397             eor     r1, r0, r1
398             eor     r0, r1, r0
399     EENTRY(memmove)
400             /* Do the buffers overlap? */
401             cmp     r0, r1
402             RETeq           /* Bail now if src/dst are the same */
403             subcc   r3, r0, r1      /* if (dst > src) r3 = dst - src */
404             subcs   r3, r1, r0      /* if (src > dsr) r3 = src - dst */
405             cmp     r3, r2          /* if (r3 < len) we have an overlap */
406             bcc     PIC_SYM(_C_LABEL(memcpy), PLT)
. . .

I didn't look closely at the arm code, and just noticed more bugs in it.

Falling through like that breaks at least profiling.  Only ENTRY() does
_PROF_PROLOGUE, so profiling of memmove() seems to be broken.

A less worse way to use EENTRY() is ENTRY() then 1 more more EENTRY()'s
with no code in between.  Then the names are just aliases.  Profiling
can't distinguish the aliases, but at least all entries get counted.
amd64 and i386 use ALTENTRY() which handles this as well as possible
(not very well -- like a tail call -- too much gets charged to the
tail function.  sparc64 also has ALTENTRY() but not the complications
needed for it to work only as badly as a tail call.

It is a style bug for memmove() to exist in the kernel.  Especially an MD
version of it.  It is bad enough that libkern has an MI memmove().  This
is not the same as the libc/string one (another style bug), but just
implements memmove() by calling bcopy().  Profiling works right for this
unless it the function is over-optimized to a tail call when it works
like ALTENTRY().

head/lib/libc/arm/string/memmove.S ( -r288373 ) has:

37      #ifndef _BCOPY
38      /* LINTSTUB: Func: void *memmove(void *, const void *, size_t) */
39      ENTRY(memmove)
40      #else
41      /* bcopy = memcpy/memmove with arguments reversed. */
42      /* LINTSTUB: Func: void bcopy(void *, void *, size_t) */
43      ENTRY(bcopy)
44              /* switch the source and destination registers */
45              eor     r0, r1, r0
46              eor     r1, r0, r1
47              eor     r0, r1, r0
48      #endif

This is the correct way to do it -- separate object files with duplication
in source files avoided using ifdefs.

49              /* Do the buffers overlap? */
50              cmp     r0, r1
51              it      eq
52              RETeq           /* Bail now if src/dst are the same */
53              ite     cc
54              subcc   r3, r0, r1      /* if (dst > src) r3 = dst - src */
55              subcs   r3, r1, r0      /* if (src > dsr) r3 = src - dst */
56              cmp     r3, r2          /* if (r3 < len) we have an overlap */
57              bcc     PIC_SYM(_C_LABEL(memcpy), PLT)
. . .

It looks to me like bcopy and memmove call memcpy under
some conditions, not the other way around. I did not
notice any code in memcpy going back to bcopy (or
memmove) in either area. (That might have lead to
recursion as things are.)

The above only shows memmove() calling bcopy(), only in the kernel.  This
would be invalid in userland since memmove() is Standard but bcopy() is
supposed to be usable by applications (unless __BSD_VISIBLE).  The
separate object files in userland make this not a problem.  Otherwise,
bcopy and memmove in the same object file would cause namespace problems
even if bcopy is implemented as a [tail] call or fall-through to memmove.

...
Another issue is if compilers (clang, gcc) are well
controlled for code substitutions to preserve
supposed FreeBSD rules about where overlaps
are well defined. The library code might not be all
there is to it as far as behavior goes for
compiled C/C++ source code.

Do you mean the namespace stuff?

head/sys/arm64/arm64/ is similar for bcopy/memmove
calling memcpy (but head/lib/libc/amd64/string/ is
not):

It is reasonable, and possibly best, to implement the usual
non-overlapping case by a call to memcpy() and not try hard to
optimize the overlapping case (or jump far away to it to keep
caches cleaner for the usual case).

head/sys/arm64/arm64/memmove.S ( -r307909 ) has:
...
head/lib/libc/amd64/string/bcopy.S has:
...

Same organisation as for plain arm, so good for libc and not so good fot
the kernel.

mips is backwards for bcopy() and implements it as
     memmove().  mips has 2 separate memcpy*.S files, one for plain arm
     and one for xscale.  The plain arm one is smaller than memmove.S,
     so presumably doesn't support overlapped copies.  The xscale one
     is much larger, so has space for overlapped copies and many optimizations
     that no longer work.  I don't know what it does.

The mix of "mips" and "arm"/"xscale" above confused me. Looking
around this seems to be referencing head/lib/libc/arm/string/
materials instead of head/lib/libc/mips/string/ or
head/sys/mips/mips/ materials.

I meant arm here.

mips does have some MI string instructions in libc which I missed or
got confused before.  Also, aarch64 has some which I missed because the
they are obfuscated by putting them in contrib instead of under libc.
mips implements all of bcopy, memcpy and memmove in bcopy.S, and seems
to be indeed like amd64 and i386 -- none of these arches bothers to
optimize memcpy by not supporting overlapping copies in it, exactly
as documented.

However, the documentation is still incorrect.  Without -ffreestanding,
compilers can and do often implement memcpy inline.  Then overlapping
copies are not supported.  The BUGS section in memcpy.3 should just be
deleted.

Bruce
_______________________________________________
[email protected] mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "[email protected]"

Reply via email to