bug#34524: wc: word count incorrect when words separated only by no-break space

2019-02-18 Thread vampyrebat
$ wc --version
wc (GNU coreutils) 8.29
Packaged by Gentoo (8.29-r1 (p1.0))

The man page for wc states: "A word is a... sequence of characters delimited by 
white space."

But its concept of white space only seems to include ASCII white space.  U+00A0 
NO-BREAK SPACE, for instance, is not recognized.

If your terminal displays UTF-8 encoding:

printf 'how are\xC2\xA0you\n'

or if your terminal displays ISO 8859-1 encoding:

printf 'how are\xA0you\n'

the visible output of this printf is "how are you".  In either case, wc does 
not recognize the second space as white space, resulting in an incorrect word 
count:

$ printf 'how are\xC2\xA0you\n' | LC_ALL=en_US.utf8 wc -w
2
$ printf 'how are\xA0you\n' | LC_ALL=en_US.iso88591 wc -w
2





bug#34488: Add sort --limit, or document workarounds for sort|head error messages

2019-02-18 Thread Assaf Gordon

Hello,

Thanks for all comments (on and off list).
Attached an updated patch with documentation.

The supported options are:

 --default-signal[=SIG]  reset signal SIG to its default signal handler.
 without SIG, all known signals are included.
 multiple signals can be comma-separated.
 --ignore-signal[=SIG]   set signal SIG to be IGNORED.
 without SIG, all known signals are included.
 multiple signals can be comma-separated.
 -p  same as --default-signal=PIPE

(lower-case "-p" as to not conflict with BSD, but of course can be
changed to another letter).

The new 'env-signal-handler.sh' test passes on GNU/linux, non-gnu/linux
(alpine), and Free/Open/Net BSD.

Comments very welcomed,
 - assaf

>From 3542f1762c9f14e2275fe5e61d5d7f6275b420a9 Mon Sep 17 00:00:00 2001
From: Assaf Gordon 
Date: Fri, 15 Feb 2019 12:31:48 -0700
Subject: [PATCH] env: new options -p/--default-signal=SIG/--ignore-signal=SIG

New options to set signal handlers to default (SIG_DFL) or ignore
(SIG_IGN) This is useful to overcome POSIX limitation that shell must
not override inherited signal state, e.g. the second 'trap' here is
a no-op:

   trap '' PIPE && sh -c 'trap - PIPE ; seq inf | head -n1'

Instead use:

   trap '' PIPE && sh -c 'env -p seq inf | head -n1'

Similarly, the following will prevent CTRL-C from terminating the
program:

   env --ignore-signal=INT seq inf > /dev/null

See https://bugs.gnu.org/34488#8 .

* NEWS: Mention new options.
* doc/coreutils.texi (env invocation): Document new options.
* man/env.x: Add example of --default-signal=SIG usage.
* src/env.c (signals): New global variable.
(shortopts,longopts): Add new options.
(usage): Print new options.
(parse_signal_params): Parse comma-separated list of signals, store in
signals variable.
(reset_signal_handlers): Set each signal to SIG_DFL/SIG_IGN.
(main): Process new options.
* src/local.mk (src_env_SOURCES): Add operand2sig.c.
* tests/misc/env-signal-handler.sh: New test.
* tests/local.mk (all_tests): Add new test.
---
 NEWS |   3 +
 doc/coreutils.texi   |  43 
 man/env.x|  35 ++
 src/env.c| 127 +-
 src/local.mk |   1 +
 tests/local.mk   |   1 +
 tests/misc/env-signal-handler.sh | 146 +++
 7 files changed, 355 insertions(+), 1 deletion(-)
 create mode 100755 tests/misc/env-signal-handler.sh

diff --git a/NEWS b/NEWS
index fdde47593..5a8e8a3de 100644
--- a/NEWS
+++ b/NEWS
@@ -67,6 +67,9 @@ GNU coreutils NEWS-*- outline -*-
   test now supports the '-N FILE' unary operator (like e.g. bash) to check
   whether FILE exists and has been modified since it was last read.
 
+  env now supports '--default-singal[=SIG]' and '--ignore-signal[=SIG]'
+  options to set signal handlers before executing a program.
+
 ** New commands
 
   basenc is added to complement existing base64,base32 commands,
diff --git a/doc/coreutils.texi b/doc/coreutils.texi
index be35de490..57b209e07 100644
--- a/doc/coreutils.texi
+++ b/doc/coreutils.texi
@@ -17227,6 +17227,49 @@ chroot /chroot env --chdir=/srv true
 env --chdir=/build FOO=bar timeout 5 true
 @end example
 
+@item --default-signal[=@var{sig}]
+Reset signal @var{sig} to its default signal handler. Without @var{sig} all
+known signals are reset to their defaults. Multiple signals can be
+comma-separated. The following command runs @command{seq} with SIGINT and
+SIGPIPE set to their default (which is to terminate the program):
+
+@example
+env --default-signal=PIPE,INT seq 1000 | head -n1
+@end example
+
+In the following example:
+
+@example
+trap '' PIPE && sh -c 'trap - PIPE ; seq inf | head -n1'
+@end example
+
+The first trap command sets SIGPIPE to ignore.  The second trap command
+ostensibly sets it back to its  default, but POSIX mandates that the shell
+must not change inherited state of the signal - so it is a no-op.
+
+Using @option{--default-signal=PIPE} (or its shortcut @option{-p}) can be
+used to force the signal to  its  default behavior:
+
+@example
+trap '' PIPE && sh -c "env -p seq inf | head -n1'
+@end example
+
+
+@item --ignore-signal[=@var{sig}]
+Ignore signal @var{sig} when running a program. Without @var{sig} all
+known signals are set to ignore. Multiple signals can be
+comma-separated. The following command runs @command{seq} with SIGINT set
+to be ignored - pressing @kbd{Ctrl-C} will not terminate it:
+
+@example
+env --ignore-signal=INT seq inf > /dev/null
+@end example
+
+
+@item -p
+Equivalent to @option{--default-signal=PIPE} - sets SIGPIPE to its default
+behavior (terminate a program upon SIGPIPE).
+
 @item -v
 @itemx --debug
 @opindex -v
diff --git a/man/env.x b/man/env.x
index 8eea79655..5e0ef975e 100644
--- a/man/env.x
+++ b/man/env.x
@@ 

bug#33468: A bug with yes and --help

2019-02-18 Thread Assaf Gordon

Hello,

On 2019-02-15 1:19 p.m., Eric Blake wrote:

On 2/15/19 12:32 PM, Assaf Gordon wrote:

There is at least one change in behavior, not sure if this is
bad enough to be a regression or doesn't really matter:

   $ yes-OLD me -- --help | head -n1
   me -- --help

   $ yes-NEW me -- --help | head -n1
   me --help


I would argue bug-fix.

[...]

So, I would suspect (although I have not yet tesed) that as patched, you
would get:

$ yes-NEW me -- --help | head -n1
me --help
$ POSIXLY_CORRECT=1 yes-NEW me -- --help | head -n1
me -- --help
$ yes-NEW -- me -- --help
me -- --help


Indeed - that's how it behaves with the patch.

Thanks for explaining.


In the gnulib patch:
s/optional/option/



In the coreutils patch:
s/non-options/non-option/


Attached updates with your suggested fixes.



Also, all coreutils callers pass reset_optind==false; does the gnulib
interface still need to provide a reset_optind parameter, given that
setting the parameter true forces reliance on the getopt-gnu module as
currently coded?


The "getopt-gnu" was already a dependency before this patch,
not sure if removing this parameter will save much hassle - what do you
think ?

-assaf


>From 08d0505683cebed0fc10cff082255fd79da2d989 Mon Sep 17 00:00:00 2001
From: Bernhard Voelker 
Date: Thu, 29 Nov 2018 09:06:26 +0100
Subject: [PATCH] long-options: add parse_gnu_standard_options_only

Discussed in https://bugs.gnu.org/33468 .

* lib/long-options.c (parse_long_options): Use EXIT_SUCCESS instead
of 0.
(parse_gnu_standard_options_only): Add function to
process the GNU default options --help and --version and fail for any other
unknown long or short option. See
https://gnu.org/prep/standards/html_node/Command_002dLine-Interfaces.html .
* lib/long-options.h (parse_gnu_standard_options_only): Declare it.
* modules/long-options (depends-on): Add stdbool, exitfail.
* top/maint.mk (sc_prohibit_long_options_without_use): Update
syntax-check rule, add new function name.
---
 lib/long-options.c   | 68 +++-
 lib/long-options.h   | 17 +
 modules/long-options |  2 ++
 top/maint.mk |  2 +-
 4 files changed, 87 insertions(+), 2 deletions(-)

diff --git a/lib/long-options.c b/lib/long-options.c
index 037f74b3a..b7acdb040 100644
--- a/lib/long-options.c
+++ b/lib/long-options.c
@@ -29,6 +29,7 @@
 #include 
 
 #include "version-etc.h"
+#include "exitfail.h"
 
 static struct option const long_options[] =
 {
@@ -71,7 +72,7 @@ parse_long_options (int argc,
 va_list authors;
 va_start (authors, usage_func);
 version_etc_va (stdout, command_name, package, version, authors);
-exit (0);
+exit (EXIT_SUCCESS);
   }
 
 default:
@@ -87,3 +88,68 @@ parse_long_options (int argc,
  the probably-new parameters when/if getopt is called later.  */
   optind = 0;
 }
+
+/* Process the GNU default long options --help and --version (see also
+   https://gnu.org/prep/standards/html_node/Command_002dLine-Interfaces.html),
+   and fail for any other unknown long or short option.
+   Use with SCAN_ALL=true to scan until "--", or with SCAN_ALL=false to stop
+   at the first non-option argument (or "--", whichever comes first).
+
+   if RESET_OPTIND=true, the global optind variable will be reset to zero,
+   preparing (and requiring) a follow-up gnu-compatible getopt() call
+   (non-gnu getopt functions use optreset=optind=1 instead of 0 for reset).
+
+   if RESET_OPTIND=false, optind is left as-is (suitable for programs
+   which do not process further option parameters (but could still
+   process parameters directly by examining argv[optind]).  */
+void
+parse_gnu_standard_options_only (int argc,
+ char **argv,
+ const char *command_name,
+ const char *package,
+ const char *version,
+ bool scan_all,
+ bool reset_optind,
+ void (*usage_func) (int),
+ /* const char *author1, ...*/ ...)
+{
+  int c;
+  int saved_opterr;
+
+  saved_opterr = opterr;
+
+  /* Print an error message for unrecognized options.  */
+  opterr = 1;
+
+  const char *optstring = scan_all ? "" : "+";
+
+  if ((c = getopt_long (argc, argv, optstring, long_options, NULL)) != -1)
+{
+  switch (c)
+{
+case 'h':
+  (*usage_func) (EXIT_SUCCESS);
+  break;
+
+case 'v':
+  {
+va_list authors;
+va_start (authors, usage_func);
+version_etc_va (stdout, command_name, package, version, authors);
+exit (EXIT_SUCCESS);
+  }
+
+default:
+  (*usage_func) (exit_failure);
+  break;
+}
+}
+
+  /* Restore previous value.  */
+  opterr = saved_opterr;
+
+  /* Reset this to zero so 

bug#34488: Add sort --limit, or document workarounds for sort|head error messages

2019-02-18 Thread Eric Blake
On 2/17/19 8:20 PM, Pádraig Brady wrote:
> On 15/02/19 07:20, Eric Blake wrote:
>> Except that POSIX has the nasty requirement that sh started with an
>> inherited ignored SIGPIPE must silently ignore all attempts from within
>> the shell to restore SIGPIPE handling to child processes of the shell:
>>
>> $ (trap '' PIPE; bash -c 'trap - PIPE; \
>>seq  | sort -n | sed 5q | wc -l')
>> 5
>> sort: write failed: 'standard output': Broken pipe
>> sort: write error
> 
>> You HAVE to use some other intermediate program if you want to override
>> an inherited ignored SIGPIPE in sh into an inherited default-behavior
>> SIGPIPE in sort.
> 
> Should we also propose to POSIX to allow trap to specify default?

That's what "trap - PIPE" is already supposed to do, except that POSIX
has the odd requirement that a signal that was inherited ignored cannot
be reset to default.

> Maybe `trap 0 PIPE` or similar?

Alas, bash has already defined that to mean the same as 'trap - EXIT PIPE'.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature