Mats Wichmann wrote in <a0a83f75-de97-4cb1-9e8e-0cad322fd...@wichmann.us>: |On 11/7/24 14:41, Steffen Nurpmeso wrote: |> So it standardizes behaviour as it exists in real life |> applications. |> (This is pretty unfortunate.)
|As I'm sure you know, standards workgroups tend to operate in accordance |with a charter that bounds their work. These vary widely depending on |circumstances and the chartering organization(s), but it's not uncommon |for projects - POSIX being one of those -to be set up to standardize |existing practice to provide incentive for various implementations not |to end up diverging from such practice without good reason. It's a |little harsh to characterize operating in accordance with one's charter |as "pretty unfortunate". Please see below. --End of <a0a83f75-de97-4cb1-9e8e-0cad322fd...@wichmann.us> Solar Designer wrote in <20241108001759.ga15...@openwall.com>: |On Thu, Nov 07, 2024 at 10:41:59PM +0100, Steffen Nurpmeso wrote: |> Steffen Nurpmeso wrote in |> <20241107210420.v7ZcHYHZ@steffen%sdaoden.eu>: |>|Solar Designer wrote in |>| <20241107041658.ga10...@openwall.com>: |>||On Thu, Nov 07, 2024 at 01:08:19AM +0100, Steffen Nurpmeso wrote: |>||> To add that the POSIX core developers mention (APPLICATION USAGE): |>||> |>||> It should be noted that using find with -print0 to pipe input to |>||> xargs -r0 is less safe than using find with -exec because if |>||> find -print0 is terminated after it has written a partial |>||> pathname, the partial pathname may be processed as if it was |>||> a complete pathname. |>|| |>||Shouldn't that behavior be treated as an xargs implementation bug or at |>||least shortcoming, and fixed as such? I hope POSIX doesn't require it? |> |> POSIX.1-2024 says, for xargs, on page 3600, lines 123174 ff.: |> |> If the -0 option is specified, the application shall ensure that |> arguments in the standard input are delimited by null bytes. |> If multiple adjacent null bytes occur in the input, each null |> byte shall be treated as a delimiter. |> If the standard input is not empty and does not end with a null |> byte, xargs should ignore the trailing non-null bytes (as this |> can signal incomplete data) but may use them as the last |> argument passed to utility. |> |> So it standardizes behaviour as it exists in real life |> applications. |> (This is pretty unfortunate.) | |Actually, to me the above reads like it merely allows the current |behavior ("may"), but encourages change ("should"). That's good. Well it actually even says (on page 3606) FUTURE DIRECTIONS A future version of this standard may require that, when the −0 option is specified, if the standard input is not empty and does not end with a null byte, xargs ignores the trailing non- null bytes. but -- as can be seen -- people do not read (all) the docs. A reference to the future in the running doc would have made me silent. |My only complaint is that "ignore" doesn't suggest this resulting in a |non-zero exit status from xargs. POSIX allows exit status in the range |of 1 to 125 if, among other possibilities, "some other error occurred". |So I think a non-zero exit status in that range on this condition isn't |too far from being compliant. | |> ... |>|A first thought is that the now really included (four decades too |>|late!) sh(1)ell's "pipefail" option was agreed upon long after the |>|text above appeared for the -print0/-r0 addition. If that is true |>|the above text is anyway a correct statement less the partial |>|pathname because the undesired "termination" will not be reflected |>|in the exit status of the pipe. | |It will be when "pipefail" is present and enabled, and even if not it's |extra and different impact - not indicating error to further commands |(which may or may not matter in a given case) vs. also processing of an |unintended file (truncated filename) by this very command. | |>||In other words, if the input stream to "xargs -0" doesn't end in a NUL, |>||xargs must not process the last maybe-partial string. I've just checked |>| |>|Other than that i would agree. |>| |>||GNU findutils xargs (not the latest version, though) and it does have |>||this problem - something we'd want to fix? |>| |>|From a glance "git show master:findutils/xargs.c::process0_stdin()" |>|of busybox also does |> ... |>|So then the above paragraph even reflects code reality. | |So it looks like we can fix/enhance xargs in this way in both GNU |findutils and Busybox findutils and perhaps elsewhere. It would also be |interesting to know if any implementations exist that already "ignore |the trailing non-null bytes". It seems to me the xargs(1) of the BSDs have a common root with identical comments, variables (zflag == -0) etc, but slightly diverged code bases "thereafter"; .. not going to dig that stuff now, .. but running f-1400, n-1000 and o-0705 (i do not have OpenBSD 7.6) yet) all interpret the trailer it seems. #|f-1400:~$ printf 'a\0b\0c' | xargs -0 printf '<%s>\n' <a> <b> <c> On OpenIndiana "2024" i see #?0|oi-2024:steffen$ printf 'a\0b\0c' | xargs -0 printf '<%s>\n' <a> <b> <c> #?0|oi-2024:steffen$ command -v printf xargs printf /bin/xargs (xargs also via /usr/gnu/bin/xargs as you say) |Another reason for this safer behavior is that it's also more consistent |with respect to empty strings. If "trailing non-null bytes" are passed |"as the last argument", then this only occurs if the last argument is |non-empty. Yet xargs otherwise does support empty arguments, except for |the last non-null-terminated one. We'd be removing this inconsistency. Seems to require changing any xargs(1) i have around. Which makes the standard *very much* requiring changes for the future ... So i take back the "unfortunate". |Alexander --End of <20241108001759.ga15...@openwall.com> --steffen | |Der Kragenbaer, The moon bear, |der holt sich munter he cheerfully and one by one |einen nach dem anderen runter wa.ks himself off |(By Robert Gernhardt) | |And in Fall, feel "The Dropbear Bard"s ball(s). | |The banded bear |without a care, |Banged on himself fore'er and e'er | |Farewell, dear collar bear