Mats Wichmann wrote in
 <a0a83f75-de97-4cb1-9e8e-0cad322fd...@wichmann.us>:
 |On 11/7/24 14:41, Steffen Nurpmeso wrote:
 |> So it standardizes behaviour as it exists in real life
 |> applications.
 |> (This is pretty unfortunate.)

 |As I'm sure you know, standards workgroups tend to operate in accordance 
 |with a charter that bounds their work.  These vary widely depending on 
 |circumstances and the chartering organization(s), but it's not uncommon 
 |for projects - POSIX being one of those -to be set up to standardize 
 |existing practice to provide incentive for various implementations not 
 |to end up diverging from such practice without good reason. It's a 
 |little harsh to characterize operating in accordance with one's charter 
 |as "pretty unfortunate".

Please see below.

 --End of <a0a83f75-de97-4cb1-9e8e-0cad322fd...@wichmann.us>

Solar Designer wrote in
 <20241108001759.ga15...@openwall.com>:
 |On Thu, Nov 07, 2024 at 10:41:59PM +0100, Steffen Nurpmeso wrote:
 |> Steffen Nurpmeso wrote in
 |>  <20241107210420.v7ZcHYHZ@steffen%sdaoden.eu>:
 |>|Solar Designer wrote in
 |>| <20241107041658.ga10...@openwall.com>:
 |>||On Thu, Nov 07, 2024 at 01:08:19AM +0100, Steffen Nurpmeso wrote:
 |>||> To add that the POSIX core developers mention (APPLICATION USAGE):
 |>||> 
 |>||>   It should be noted that using find with -print0 to pipe input to
 |>||>   xargs -r0 is less safe than using find with -exec because if
 |>||>   find -print0 is terminated after it has written a partial
 |>||>   pathname, the partial pathname may be processed as if it was
 |>||>   a complete pathname.
 |>||
 |>||Shouldn't that behavior be treated as an xargs implementation bug or at
 |>||least shortcoming, and fixed as such?  I hope POSIX doesn't require it?
 |> 
 |> POSIX.1-2024 says, for xargs, on page 3600, lines 123174 ff.:
 |> 
 |>   If the -0 option is specified, the application shall ensure that
 |>   arguments in the standard input are delimited by null bytes.
 |>   If multiple adjacent null bytes occur in the input, each null
 |>   byte shall be treated as a delimiter.
 |>   If the standard input is not empty and does not end with a null
 |>   byte, xargs should ignore the trailing non-null bytes (as this
 |>   can signal incomplete data) but may use them as the last
 |>   argument passed to utility.
 |> 
 |> So it standardizes behaviour as it exists in real life
 |> applications.
 |> (This is pretty unfortunate.)
 |
 |Actually, to me the above reads like it merely allows the current
 |behavior ("may"), but encourages change ("should").  That's good.

Well it actually even says (on page 3606)

  FUTURE DIRECTIONS
    A future version of this standard may require that, when the
    −0 option is specified, if the standard input is not empty and
    does not end with a null byte, xargs ignores the trailing non-
    null bytes.

but -- as can be seen -- people do not read (all) the docs.
A reference to the future in the running doc would have made me
silent.

 |My only complaint is that "ignore" doesn't suggest this resulting in a
 |non-zero exit status from xargs.  POSIX allows exit status in the range
 |of 1 to 125 if, among other possibilities, "some other error occurred".
 |So I think a non-zero exit status in that range on this condition isn't
 |too far from being compliant.
 |
 |>   ...
 |>|A first thought is that the now really included (four decades too
 |>|late!) sh(1)ell's "pipefail" option was agreed upon long after the
 |>|text above appeared for the -print0/-r0 addition.  If that is true
 |>|the above text is anyway a correct statement less the partial
 |>|pathname because the undesired "termination" will not be reflected
 |>|in the exit status of the pipe.
 |
 |It will be when "pipefail" is present and enabled, and even if not it's
 |extra and different impact - not indicating error to further commands
 |(which may or may not matter in a given case) vs. also processing of an
 |unintended file (truncated filename) by this very command.
 |
 |>||In other words, if the input stream to "xargs -0" doesn't end in a NUL,
 |>||xargs must not process the last maybe-partial string.  I've just checked
 |>|
 |>|Other than that i would agree.
 |>|
 |>||GNU findutils xargs (not the latest version, though) and it does have
 |>||this problem - something we'd want to fix?
 |>|
 |>|From a glance "git show master:findutils/xargs.c::process0_stdin()"
 |>|of busybox also does
 |>  ...
 |>|So then the above paragraph even reflects code reality.
 |
 |So it looks like we can fix/enhance xargs in this way in both GNU
 |findutils and Busybox findutils and perhaps elsewhere.  It would also be
 |interesting to know if any implementations exist that already "ignore
 |the trailing non-null bytes".

It seems to me the xargs(1) of the BSDs have a common root with
identical comments, variables (zflag == -0) etc, but slightly
diverged code bases "thereafter"; .. not going to dig that stuff
now, .. but running f-1400, n-1000 and o-0705 (i do not have
OpenBSD 7.6) yet) all interpret the trailer it seems.

  #|f-1400:~$ printf 'a\0b\0c' | xargs -0 printf '<%s>\n'
  <a>
  <b>
  <c>

On OpenIndiana "2024" i see

  #?0|oi-2024:steffen$ printf 'a\0b\0c' | xargs -0 printf '<%s>\n'
  <a>
  <b>
  <c>
  #?0|oi-2024:steffen$ command -v printf xargs
  printf
  /bin/xargs

(xargs also via /usr/gnu/bin/xargs as you say)

 |Another reason for this safer behavior is that it's also more consistent
 |with respect to empty strings.  If "trailing non-null bytes" are passed
 |"as the last argument", then this only occurs if the last argument is
 |non-empty.  Yet xargs otherwise does support empty arguments, except for
 |the last non-null-terminated one.  We'd be removing this inconsistency.

Seems to require changing any xargs(1) i have around.

Which makes the standard *very much* requiring changes for the
future ...  So i take back the "unfortunate".

 |Alexander
 --End of <20241108001759.ga15...@openwall.com>

--steffen
|
|Der Kragenbaer,                The moon bear,
|der holt sich munter           he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)
|
|And in Fall, feel "The Dropbear Bard"s ball(s).
|
|The banded bear
|without a care,
|Banged on himself fore'er and e'er
|
|Farewell, dear collar bear

Reply via email to