Re: alternatives to 'strlcpy'

2017-09-28 Thread Paul Eggert

On 09/28/2017 12:36 PM, Dmitry Selyutin wrote:

ptrdiff_t is not a good choice here,  > because it represents not length, but 
difference between pointers
They're the same concept in C. In 7th Edition Unix strlen returned int, 
and this was better than having it return an unsigned type due to common 
problems with wraparound arithmetic and comparisons with unsigned types. 
In GNU Emacs source code, the general style has been to prefer signed 
integers for indexes and sizes and the like, and this has helped us gain 
reliability because we can catch integer overflows better by compiling 
with -fsanitize=undefined, bugs that we couldn't catch so easily if we 
used unsigned integers.


This is why I suggest ptrdiff_t for new low-level APIs. Although it 
limits object size to PTRDIFF_MAX, that limitation is present already 
because in practice behavior is undefined when you compute p1-p2 when 
((char *) p1 - (char *) p2) does not fit into ptrdiff_t, which means 
that objects larger than PTRDIFF_MAX are dangerous and should be avoided 
anyway.


ssize_t is a worse choice for portable software, as it is not in 
stddef.h and on a few old platforms ssize_t is narrower than size_t 
(POSIX allows this).


All this is moot of course if any new function returns bool or void, as 
I suspect it should.






Re: alternatives to 'strlcpy'

2017-09-28 Thread Paul Eggert

On 09/28/2017 11:39 AM, Bruno Haible wrote:

in BSD, it is common practice to try
a call with a fixed-size stack-allocated or statically allocated buffer,
and try dynamic memory only when this first attempt fails.


This doesn't match my experience with BSD application code, where the 
most common practice is to call strlcpy and ignore the return value. I 
looked at the source for OpenSSH 7.5 (the current version). Of its 56 
calls to strlcpy, only 15 check the return value, and none of these 
calls try dynamic allocation later (they all simply fail in the caller 
somehow). Of the 41 calls that ignore the return value, sometimes this 
is a bug and sometimes not, and often when it is not a bug plain strcpy 
would work as well. So the strlcpy API appears to be a poor fit for 
OpenSSH's needs.



/* Copies the string SRC, possibly truncated, into the bounded memory area
[DST,...,DST+DSTSIZE-1].
If the result fits, it returns 0.
If DSTSIZE is 0, it returns -1 and sets errno to EINVAL.
If DSTSIZE is nonzero and the result does not fit, it produces a NUL-
terminated truncated result, and returns -1 and sets errno to E2BIG.  */
extern int strgcpy (char *__dst, size_t __dstsize, const char *__src)
   __attribute__ ((__nonnull__ (1, 3)))
   __attribute__ ((__warn_unused_result__));


Some comments:

* I suggest a name that does not begin with "str", as that's reserved to 
the implementation.


* I too am bothered by this function setting errno (and using two 
different errno values to boot); it'd be simpler for it to just to 
return a value, or to abort if the destination is too small.


* If the intent is that this function be used for test-case code that 
aborts if the source is too long, then I suggest that the function 
simply return 'void' and abort if the source is too long, as that would 
be more convenient for the callers. If the intent is something else, I'd 
like to know what it is so that I can think about it more.





Re: alternatives to 'strlcpy'

2017-09-28 Thread Dmitry Selyutin
>> Though I would change its ssize_t 
>> to ptrdiff_t, so that it depends only on stddef.h.
>   - The mixed use of ssize_t vs. size_t.
What's wrong with ssize_t? As for me, ptrdiff_t is not a good choice here, 
because it represents not length, but difference between pointers. String 
length IS a difference between pointer in some way, but well, I think that's 
not the first allegory that comes to mind when one talks about strings. As for 
me, the only reason to avoid ssize_t is a dependency to stddef.h, but well, I 
think it is going to be used in most projects anyway, especially if these 
projects are somewhat complex. I thought size_t also comes from stddef, but I 
may be wrong. :-)

>   - The argument order.
That's also I think largely a historical issue. BTW, on x86_64 it makes some 
sense due to use of $rdi/$rsi. The only register that must be copied is $rcx. 
After then, one can implement memcpy/strcpy in just a few lines of assembly. 
Anyway, I agree that the argument order should better be changed, to avoid 
confusing strgcpy with its relatives (strncpy, strlcpy, strscpy).

>   If DSTSIZE is 0, it returns -1 and sets errno to EINVAL.
>   If DSTSIZE is nonzero and the result does not fit, it produces a NUL-
>   terminated truncated result, and returns -1 and sets errno to E2BIG.
I really dislike the idea of setting errno, since it involves thread-local 
storage. I know it's a "cold" path, but nevertheless, I really favor returning 
error codes if it is possible. And it is possible with ssize_t. Moreover, more 
modern POSIX functions do actually favor returning error code directly.


Anyway, I really think it is a good idea to provide yet another function, for 
me it seems the most logical choice; I cannot guess what "g" in strgcpy means 
though. Guarded? GNU? Generic? Godlike? :-)
The only possible reason to avoid doing such change is that some people may 
think that these Open Source guys went crazy and invented at least three 
functions to copy strings (four, including strscpy from the Linux kernel); 
moreover, since some people do not actually know the difference between strcpy 
and strcat, the situation is even worse. :-)

-- 
With best regards,
Dmitry Selyutin

signature.asc
Description: This is a digitally signed message part.


Re: alternatives to 'strlcpy'

2017-09-28 Thread Bruno Haible
Paul Eggert wrote:
> If you really want a function whose semantics are like strlcpy's but has 
> __warn_unused_result__ semantics ..., then I suppose we could 
> invent one and use it for Gnulib. But we should not call it strlcpy

Yes, I do want such a function for copying a string to a bounded memory area.

What you don't like about strlcpy:
  - The name.
  - The return value: why compute strlen(src) when the function does not
otherwise need it?

The reason for this design is that in BSD, it is common practice to try
a call with a fixed-size stack-allocated or statically allocated buffer,
and try dynamic memory only when this first attempt fails. The return value
is thus useful for the second step. For system calls this is a reasonable
design, but not here: the caller can call strlen(src) by himself. It is
strange design to do the first part of the second step in the function
which is supposed to do the first step. My take on this is that such
an algorithm should be abstracted in a facility of its own, like
 or .

Other things I don't like about strlcpy:
  - It's hard to remember whether the return value should be tested
as >= dstsize, > dstsize, <= dstsize, < dstsize. As Jim writes [1]
"Some of us forget to read documentation. Or read it, and then forget
 details."
  - The argument order: while snprintf, strftime, strfmon, and others
receive the destination buffer size immediately following the
destination buffer address, strlcpy wants it as third argument.
Which is against the principle "keep together what belongs together".

Dmitry suggested to look at strscpy [2]. About this one I don't like
  - The return value convention: It's suitable for in-kernel APIs only.
  - The mixed use of ssize_t vs. size_t.
  - The argument order.

Here's my take:

/* Copies the string SRC, possibly truncated, into the bounded memory area
   [DST,...,DST+DSTSIZE-1].
   If the result fits, it returns 0.
   If DSTSIZE is 0, it returns -1 and sets errno to EINVAL.
   If DSTSIZE is nonzero and the result does not fit, it produces a NUL-
   terminated truncated result, and returns -1 and sets errno to E2BIG.  */
extern int strgcpy (char *__dst, size_t __dstsize, const char *__src)
  __attribute__ ((__nonnull__ (1, 3)))
  __attribute__ ((__warn_unused_result__));

Bruno

[1] https://meyering.net/crusade-to-eliminate-strncpy/
[2] https://www.kernel.org/doc/htmldocs/kernel-api/API-strscpy.html