On Mon, 13 Nov 2017, Warner Losh wrote:

 Add notes about overlapping copies.

 Add notes to each of these that specifically state that results are
 undefined if the strings overlap. In the case of memcpy, we document
 the overlapping behavior on FreeBSD (pre-existing). For str*, it is
 left unspecified, however, since the default (and x86) implementations
 do not handle overlapping strings properly.

 PR: 223653
 Sponsored by: Netflix

Modified:
 head/lib/libc/string/memcpy.3
 head/lib/libc/string/strcat.3
 head/lib/libc/string/strcpy.3

C standards and POSIX have much better wording.

Modified: head/lib/libc/string/memcpy.3
==============================================================================
--- head/lib/libc/string/memcpy.3       Mon Nov 13 16:53:36 2017        
(r325764)
+++ head/lib/libc/string/memcpy.3       Mon Nov 13 17:04:44 2017        
(r325765)
@@ -54,6 +54,11 @@ bytes from string
.Fa src
to string
.Fa dst .
+If
+.Fa src
+and
+.Fa dst
+overlap, the results are not defined.
.Sh RETURN VALUES
The
.Fn memcpy

The bugs here are:
- src and dst are pointers.  Overlap for pointers makes no sense except
  for pointers to single objects but the pointers here are not for that.
  Standards use the correct description "objects that overlap".
- style bug: verboseness in the nonsense, by naming the non-objects and
  then having to mark up their names.  A full description would be much
  longer, especially for non-counted copies.  For strcpy(), the relevant
  obects are the array of char beginning at .Fa src and ending at its
  terminating NUL, and the array of char beginning at .Fa dst with the
  same number of elements as the source array.  Standards omit all details
  and say simple "objects".  memcpy() only has a count so it is simpler.
  strncpy() has a count in addition to the NUL terminators, so it is more
  complicated.
- use of the wrong term "results".  The correct term is "behavior".  Results
  are are worst implementation-defined in Standards, since it is almost
  impossible for a result to be undefined without the whole behaviour also
  being undefined.  These functions have only 1 result.  In Standard wording,
  function results are return values, not the whole behavior of the function.
- use of the non-technical term "not defined".  Standards use the technical
  term "undefined".

The Standards wording is "If copying takes place between objects that
overlap, the behaviour is undefined".

Modified: head/lib/libc/string/strcat.3
Modified: head/lib/libc/string/strcpy.3
- as above, except "undefined" is spelled correctly

Grepping for "overlap" in draft C99 shows the following other functions
with overlap giving undefined behavior:
   snprintf, sprintf, sscanf, vsnprintf, vsprintf,
   mbstowcs, wcstombs,
   strxfrm, strftime.
   all functions in <wchar.h> where overlap might occur, except when
   explicitly specified (seems to be only explicitly specified for
   wmemmove)

The term "overlap" occurs in 193 section 3 man pages in -current (counting
links as 1 each).  It isi still never used for any of the above functions.
It is used in the following related man pages:
- bcmp.3.  This use is nonsense.  Overlap doesn't cause any problems for
  bcmp().
- bcopy.3 and memmove.3.  This use is not nonsense, but its wording is much
  worse than in Standards.  It says "The .Fn function copies .Fa len bytes
  from string .Fa src to string .Fa dst.  The two strings may overlap.".
  But these functions don't even copy strings, and "may overlap" gives no
  details.  Standards say "The .Fn function copies .Fn n characters from
  the object pointed to be .Fa s2 into the object pointed to by .Fa s1.
  [Complicated 5-line sentence saying that the copying acts as if it is
  done through a temporary object that doesn't overlap.]".  I think standards
  say somewhere, but man pages say nowhere, that copying between
  non-overlapping objects has the defined behaviour of copying the bytes
  (u_char's) it is done by assignment of bytes or a function like memmove()
  (copying of general objects is more complicated since it may skip padding
  or otherwise fail to preserve the representation).  This is needed so that
  the 5-line sentence doesn't need to be much longer.
- memcpy.3.  It was already documented in the BUGS section that memcpy() is
  implemented as bcopy() and therefore the "strings" "may overlap" [then
  portability considerations due to this].  This is buggier than before:
  - old bug: only the MI implementation of memcpy() clearly implements
    memcpy() as bcopy().  The amd64, i386 and arm MD implementations
    do the same.  mips is backwards for bcopy() and implements it as
    memmove().  mips has 2 separate memcpy*.S files, one for plain arm
    and one for xscale.  The plain arm one is smaller than memmove.S,
    so presumably doesn't support overlapped copies.  The xscale one
    is much larger, so has space for overlapped copies and many optimizations
    that no longer work.  I don't know what it does.  (The i386 and amd64
    ones have a few small optimizations that stopped working 25 years ago.
    At least they are short.)
  - new bug: the addition is inconsistent with the BUGS section.
- strlcpy.c.  This uses a combination of the C99 wording for strcpy() and
  buggy wording involving the pointer names (for both lcat and lcpy):
  If the .Fa src and .Fa dst strings overlap, the behavior is undefined.
  Now it is even better to not give the details about the objects, since
  at least for dst, the object isn't the string pointed to by dst --
  it is the char array of length MIN(dstsize, strlen(dst) + 1).  Similarly
  for old strncat() and strncpy() and not so old snprintf().  For the
  standard 'n' functions, the object is clearly limited by the dstsize arg
  and there is no problem in theory or practice if the dst string overlaps
  beyond the part that is written to, so it is just a bug in the
  specification = man page of the 'l' functions to have a stricter limit
  on overlap.

Grepping for "overlap" in draft POSIX.1-2001 shows the following additional
functions with overlap giving undefined behavior:
    bcopy: POSIX duplicates the C99 wording for memcpy, but for bcopy it
      has much the same bugs as this change: it uses "the area pointed to
      be .Fa s1", etc.
    memccpy, swab
    independent details for functions in <wchar.h> (but POSIX combines the
    details for at least sprintf() and snprintf()).

Bruce
_______________________________________________
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"

Reply via email to