Re: alternatives to 'strlcpy'
On 09/28/2017 12:36 PM, Dmitry Selyutin wrote: ptrdiff_t is not a good choice here, > because it represents not length, but difference between pointers They're the same concept in C. In 7th Edition Unix strlen returned int, and this was better than having it return an unsigned type due to common problems with wraparound arithmetic and comparisons with unsigned types. In GNU Emacs source code, the general style has been to prefer signed integers for indexes and sizes and the like, and this has helped us gain reliability because we can catch integer overflows better by compiling with -fsanitize=undefined, bugs that we couldn't catch so easily if we used unsigned integers. This is why I suggest ptrdiff_t for new low-level APIs. Although it limits object size to PTRDIFF_MAX, that limitation is present already because in practice behavior is undefined when you compute p1-p2 when ((char *) p1 - (char *) p2) does not fit into ptrdiff_t, which means that objects larger than PTRDIFF_MAX are dangerous and should be avoided anyway. ssize_t is a worse choice for portable software, as it is not in stddef.h and on a few old platforms ssize_t is narrower than size_t (POSIX allows this). All this is moot of course if any new function returns bool or void, as I suspect it should.
Re: alternatives to 'strlcpy'
On 09/28/2017 11:39 AM, Bruno Haible wrote: in BSD, it is common practice to try a call with a fixed-size stack-allocated or statically allocated buffer, and try dynamic memory only when this first attempt fails. This doesn't match my experience with BSD application code, where the most common practice is to call strlcpy and ignore the return value. I looked at the source for OpenSSH 7.5 (the current version). Of its 56 calls to strlcpy, only 15 check the return value, and none of these calls try dynamic allocation later (they all simply fail in the caller somehow). Of the 41 calls that ignore the return value, sometimes this is a bug and sometimes not, and often when it is not a bug plain strcpy would work as well. So the strlcpy API appears to be a poor fit for OpenSSH's needs. /* Copies the string SRC, possibly truncated, into the bounded memory area [DST,...,DST+DSTSIZE-1]. If the result fits, it returns 0. If DSTSIZE is 0, it returns -1 and sets errno to EINVAL. If DSTSIZE is nonzero and the result does not fit, it produces a NUL- terminated truncated result, and returns -1 and sets errno to E2BIG. */ extern int strgcpy (char *__dst, size_t __dstsize, const char *__src) __attribute__ ((__nonnull__ (1, 3))) __attribute__ ((__warn_unused_result__)); Some comments: * I suggest a name that does not begin with "str", as that's reserved to the implementation. * I too am bothered by this function setting errno (and using two different errno values to boot); it'd be simpler for it to just to return a value, or to abort if the destination is too small. * If the intent is that this function be used for test-case code that aborts if the source is too long, then I suggest that the function simply return 'void' and abort if the source is too long, as that would be more convenient for the callers. If the intent is something else, I'd like to know what it is so that I can think about it more.
Re: alternatives to 'strlcpy'
>> Though I would change its ssize_t >> to ptrdiff_t, so that it depends only on stddef.h. > - The mixed use of ssize_t vs. size_t. What's wrong with ssize_t? As for me, ptrdiff_t is not a good choice here, because it represents not length, but difference between pointers. String length IS a difference between pointer in some way, but well, I think that's not the first allegory that comes to mind when one talks about strings. As for me, the only reason to avoid ssize_t is a dependency to stddef.h, but well, I think it is going to be used in most projects anyway, especially if these projects are somewhat complex. I thought size_t also comes from stddef, but I may be wrong. :-) > - The argument order. That's also I think largely a historical issue. BTW, on x86_64 it makes some sense due to use of $rdi/$rsi. The only register that must be copied is $rcx. After then, one can implement memcpy/strcpy in just a few lines of assembly. Anyway, I agree that the argument order should better be changed, to avoid confusing strgcpy with its relatives (strncpy, strlcpy, strscpy). > If DSTSIZE is 0, it returns -1 and sets errno to EINVAL. > If DSTSIZE is nonzero and the result does not fit, it produces a NUL- > terminated truncated result, and returns -1 and sets errno to E2BIG. I really dislike the idea of setting errno, since it involves thread-local storage. I know it's a "cold" path, but nevertheless, I really favor returning error codes if it is possible. And it is possible with ssize_t. Moreover, more modern POSIX functions do actually favor returning error code directly. Anyway, I really think it is a good idea to provide yet another function, for me it seems the most logical choice; I cannot guess what "g" in strgcpy means though. Guarded? GNU? Generic? Godlike? :-) The only possible reason to avoid doing such change is that some people may think that these Open Source guys went crazy and invented at least three functions to copy strings (four, including strscpy from the Linux kernel); moreover, since some people do not actually know the difference between strcpy and strcat, the situation is even worse. :-) -- With best regards, Dmitry Selyutin signature.asc Description: This is a digitally signed message part.
Re: alternatives to 'strlcpy'
Paul Eggert wrote: > If you really want a function whose semantics are like strlcpy's but has > __warn_unused_result__ semantics ..., then I suppose we could > invent one and use it for Gnulib. But we should not call it strlcpy Yes, I do want such a function for copying a string to a bounded memory area. What you don't like about strlcpy: - The name. - The return value: why compute strlen(src) when the function does not otherwise need it? The reason for this design is that in BSD, it is common practice to try a call with a fixed-size stack-allocated or statically allocated buffer, and try dynamic memory only when this first attempt fails. The return value is thus useful for the second step. For system calls this is a reasonable design, but not here: the caller can call strlen(src) by himself. It is strange design to do the first part of the second step in the function which is supposed to do the first step. My take on this is that such an algorithm should be abstracted in a facility of its own, like or . Other things I don't like about strlcpy: - It's hard to remember whether the return value should be tested as >= dstsize, > dstsize, <= dstsize, < dstsize. As Jim writes [1] "Some of us forget to read documentation. Or read it, and then forget details." - The argument order: while snprintf, strftime, strfmon, and others receive the destination buffer size immediately following the destination buffer address, strlcpy wants it as third argument. Which is against the principle "keep together what belongs together". Dmitry suggested to look at strscpy [2]. About this one I don't like - The return value convention: It's suitable for in-kernel APIs only. - The mixed use of ssize_t vs. size_t. - The argument order. Here's my take: /* Copies the string SRC, possibly truncated, into the bounded memory area [DST,...,DST+DSTSIZE-1]. If the result fits, it returns 0. If DSTSIZE is 0, it returns -1 and sets errno to EINVAL. If DSTSIZE is nonzero and the result does not fit, it produces a NUL- terminated truncated result, and returns -1 and sets errno to E2BIG. */ extern int strgcpy (char *__dst, size_t __dstsize, const char *__src) __attribute__ ((__nonnull__ (1, 3))) __attribute__ ((__warn_unused_result__)); Bruno [1] https://meyering.net/crusade-to-eliminate-strncpy/ [2] https://www.kernel.org/doc/htmldocs/kernel-api/API-strscpy.html