bug#34488: Add sort --limit, or document workarounds for sort|head error messages

2019-02-25 Thread Pádraig Brady
Thanks for doing all that.
I've attached a few changes:

- spelling fixes
- usage() clarified/reordered
- ensure sigset_t are initialized
- Don't setprocmask() unless specified
- Simplified SETMASK_SIGNAL_OPTION handling
- The test missed `env` as a prerequisite
- The test was slow/spun cpu, so used sleep instead of seq
- Used $SHELL in case sh didn't support trap

I see that the last signal that `kill -l` lists is not included.
I think we should be processing SIGNUM_BOUND also?

cheers,
Pádraig
>From f54e67f2a9dcc4db287c31969e99899582f53a88 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?P=C3=A1draig=20Brady?= 
Date: Mon, 25 Feb 2019 22:30:09 -0800
Subject: [PATCH] env: misc signal handling fixes

spelling fixes
usage() clarified/reordered
ensure sigset_t are initialized
Don't setprocmask() unless specified
Simplified SETMASK_SIGNAL_OPTION handling
test missed `env` as a prerequisite
test was slow/spun cpu, so used sleep instead of seq
Used $SHELL in case sh didn't support trap
---
 NEWS |  4 ++--
 doc/coreutils.texi   |  6 +++---
 man/env.x|  4 ++--
 src/env.c| 46 +++-
 tests/misc/env-signal-handler.sh | 33 
 5 files changed, 47 insertions(+), 46 deletions(-)

diff --git a/NEWS b/NEWS
index c310d1f..e090c72 100644
--- a/NEWS
+++ b/NEWS
@@ -67,10 +67,10 @@ GNU coreutils NEWS-*- outline -*-
   test now supports the '-N FILE' unary operator (like e.g. bash) to check
   whether FILE exists and has been modified since it was last read.
 
-  env now supports '--default-singal[=SIG]' and '--ignore-signal[=SIG]'
+  env now supports '--default-signal[=SIG]' and '--ignore-signal[=SIG]'
   options to set signal handlers before executing a program.
 
-  env now supports '--{block,unblock,setmask}-singal[=SIG]' to block/unblock
+  env now supports '--{block,unblock,setmask}-signal[=SIG]' to block/unblock
   signal delivery before executing a program.
 
   env now supports '--list-signal-actions' and '--list-blocked-signals'
diff --git a/doc/coreutils.texi b/doc/coreutils.texi
index 5199b83..30a5990 100644
--- a/doc/coreutils.texi
+++ b/doc/coreutils.texi
@@ -17274,7 +17274,7 @@ Most operating systems do not allow ignoring @samp{SIGKILL}, @samp{SIGSTOP}
 
 Multiple (and contradictory) @option{--default-signal=SIG} and
 @option{--ignore-signal=SIG} options are processed left-to-right,
-with the latter taking precedence. In the follwing example, @samp{SIGPIPE} is
+with the latter taking precedence. In the following example, @samp{SIGPIPE} is
 set to default while @samp{SIGINT} is ignored:
 
 @example
@@ -17282,10 +17282,10 @@ env --default-signal=INT,PIPE --ignore-signal=INT
 @end example
 
 @item --block-signal[=@var{sig}]
-Block signal @var{sig} from being delivered.
+Block signal(s) @var{sig} from being delivered.
 
 @item --unblock-signal[=@var{sig}]
-Unblock signal @var{sig}.
+Unblock signal(s) @var{sig}.
 
 @item --setmask-signal[=@var{sig}]
 Set list of blocked signals to @var{sig}. All other signals will be unblocked.
diff --git a/man/env.x b/man/env.x
index b787fe3..b2eb371 100644
--- a/man/env.x
+++ b/man/env.x
@@ -38,7 +38,7 @@ parameter the script will likely fail with:
 .PP
 See the full documentation for more details.
 .PP
-.SS "\-\-default-signal[=SIG]" to 'untrap' a singal
+.SS "\-\-default-signal[=SIG]" to 'untrap' a signal
 This option allows setting a signal handler to its default
 action. This is useful to reset a signal after setting it
 to 'ignore' using the shell's trap command.
@@ -87,7 +87,7 @@ Multiple (and contradictory)
 and
 .B \-\-ignore\-signal=SIG
 options are processed left-to-right, with the latter taking precedence.
-In the follwing example, SIGPIPE is set to default while SIGINT is ignored:
+In the following example, SIGPIPE is set to default while SIGINT is ignored:
 .RS
 .nf
 env \-\-default\-signal=INT,PIPE \-\-ignore\-signal=INT
diff --git a/src/env.c b/src/env.c
index 4385620..1acfc11 100644
--- a/src/env.c
+++ b/src/env.c
@@ -67,6 +67,8 @@ static sigset_t block_signals;
 /* Set of signals to unblock.  */
 static sigset_t unblock_signals;
 
+/* Whether signal mask adjustment requested.  */
+static bool sig_mask_changed;
 
 static char const shortopts[] = "+C:ipS:u:v0 \t";
 
@@ -125,35 +127,32 @@ Set each NAME to VALUE in the environment and run COMMAND.\n\
   -u, --unset=NAME remove variable from the environment\n\
 "), stdout);
   fputs (_("\
-  --block-signal[=SIG]block signal SIG.\n\
+  -C, --chdir=DIR  change working directory to DIR\n\
 "), stdout);
   fputs (_("\
-  --unblock-signal[=SIG]  unblock signal SIG.\n\
+  -S, --split-string=S  process and split S into separate arguments;\n\
+used to pass multiple arguments on shebang lines\n\
 "), stdout);
   fputs (_("\
-  --setmask-signal[=SIG]  set blocked signal(s) mask to SIG.\n\
+  

bug#34524: wc: word count incorrect when words separated only by no-break space

2019-02-25 Thread Pádraig Brady
On 24/02/19 19:55, Pádraig Brady wrote:
> On 24/02/19 17:07, Pádraig Brady wrote:
>> So non break space is generally considered a word delimiter,
>> though there are complications you detail from unicode.
>>
>> In regard to options for enabling various behaviors for wc(1),
>> I'm thinking we might keep the strict POSIX isspace() behavior
>> with LC_CTYPE=C and/or POSIXLY_CORRECT=1, and use iswnbspace()
>> by default, since that's the most common operation one would want,
>> and is consistent with libreoffice for example.
>> I'll adjust the patch along those lines.
> 
> Full patch attached.

Updated patch attached. I'll push in a few hours.
Marking this bug as done.

cheers,
Pádraig.

>From c04ff0df5dfe788a38162cb2609b38495e765383 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?P=C3=A1draig=20Brady?= 
Date: Sat, 23 Feb 2019 21:23:47 -0800
Subject: [PATCH] wc: treat non break space as a word separator

* src/wc.c (iswnbspace): A new function to match
characters in this class.
(main): Initialize posixly_correct from the environment,
to allow disabling honoring NBSP in non C locales.
(wc): Call is[w]nbspace() as well as is[w]space.
* bootstrap.conf: Ensure btowc is available.
* tests/misc/wc-nbsp.sh: A new test.
* tests/local.mk: Reference the new test.
* NEWS: Mention the change in behavior.
---
 NEWS  |  3 +++
 bootstrap.conf|  1 +
 src/wc.c  | 25 +++--
 tests/local.mk|  1 +
 tests/misc/wc-nbsp.sh | 42 ++
 5 files changed, 70 insertions(+), 2 deletions(-)
 create mode 100755 tests/misc/wc-nbsp.sh

diff --git a/NEWS b/NEWS
index e400554..9bfa3c3 100644
--- a/NEWS
+++ b/NEWS
@@ -53,6 +53,9 @@ GNU coreutils NEWS-*- outline -*-
   operator, so POSIX changed this to 'test -e FILE'.  Scripts using it were
   already broken and non-portable; the -a unary operator was never documented.
 
+  wc now treats non breaking space characters as word delimiters
+  unless the POSIXLY_CORRECT environment variable is set.
+
 ** New features
 
   id now supports specifying multiple users.
diff --git a/bootstrap.conf b/bootstrap.conf
index a525ef4..4926152 100644
--- a/bootstrap.conf
+++ b/bootstrap.conf
@@ -38,6 +38,7 @@ gnulib_modules="
   backup-rename
   base32
   base64
+  btowc
   buffer-lcm
   c-strcase
   cl-strtod
diff --git a/src/wc.c b/src/wc.c
index 179abbe..2381804 100644
--- a/src/wc.c
+++ b/src/wc.c
@@ -74,6 +74,9 @@ static bool have_read_stdin;
 /* Used to determine if file size can be determined without reading.  */
 static size_t page_size;
 
+/* Enable to _not_ treat non breaking space as a word separator.  */
+static bool posixly_correct;
+
 /* The result of calling fstat or stat on a file descriptor or file.  */
 struct fstatus
 {
@@ -147,6 +150,21 @@ the following order: newline, word, character, byte, maximum line length.\n\
   exit (status);
 }
 
+/* Return non zero if a non breaking space.  */
+static int _GL_ATTRIBUTE_PURE
+iswnbspace (wint_t wc)
+{
+  return ! posixly_correct
+ && (wc == 0x00A0 || wc == 0x2007
+ || wc == 0x202F || wc == 0x2060);
+}
+
+static int
+isnbspace (int c)
+{
+  return iswnbspace (btowc (c));
+}
+
 /* FILE is the name of the file (or NULL for standard input)
associated with the specified counters.  */
 static void
@@ -455,7 +473,7 @@ wc (int fd, char const *file_x, struct fstatus *fstatus, off_t current_pos)
   if (width > 0)
 linepos += width;
 }
-  if (iswspace (wide_char))
+  if (iswspace (wide_char) || iswnbspace (wide_char))
 goto mb_word_separator;
   in_word = true;
 }
@@ -538,7 +556,8 @@ wc (int fd, char const *file_x, struct fstatus *fstatus, off_t current_pos)
   if (isprint (to_uchar (p[-1])))
 {
   linepos++;
-  if (isspace (to_uchar (p[-1])))
+  if (isspace (to_uchar (p[-1]))
+  || isnbspace (to_uchar (p[-1])))
 goto word_separator;
   in_word = true;
 }
@@ -681,6 +700,8 @@ main (int argc, char **argv)
  so that processes running in parallel do not intersperse their output.  */
   setvbuf (stdout, NULL, _IOLBF, 0);
 
+  posixly_correct = (getenv ("POSIXLY_CORRECT") != NULL);
+
   print_lines = print_words = print_chars = print_bytes = false;
   print_linelength = false;
   total_lines = total_words = total_chars = total_bytes = max_line_length = 0;
diff --git a/tests/local.mk b/tests/local.mk
index 4751886..bacc5d2 100644
--- a/tests/local.mk
+++ b/tests/local.mk
@@ -272,6 +272,7 @@ all_tests =	\
   tests/misc/wc.pl\
   tests/misc/wc-files0-from.pl			\
   tests/misc/wc-files0.sh			\
+  tests/misc/wc-nbsp.sh\
   tests/misc/wc-parallel.sh			

bug#34488: Add sort --limit, or document workarounds for sort|head error messages

2019-02-25 Thread Assaf Gordon

Hello,

Thanks for all comments.

On 2019-02-24 11:33 a.m., Paul Eggert wrote:
Thanks for doing all that. Although Pádraig is not enthusiastic about a 
shortcut like -p, I'm a bit warmer to it, as it's an important special 
case to fix a wart in POSIX. No big deal either way.


For now I kept "-p", can be removed later of course.
The first patch includes Pádraig's recent suggestions (slightly modified).


The documentation should mention that SIGCHLD is special [...]


The documentation should say what happens if mutually-contradictory 
options are specified, [...]


The documentation should echo this suggestion in 
: 


I've added those, and I welcome all improvements suggestion to 
grammar/phrasing/etc.


> There should be options --block-signal[=SIG], --unblock-signal[=SIG],
> and --setmask-signal[=SIG] that affect the signal mask, which is also
> inherited by the child. These can be implemented via pthread_sigmask.

The second patch adds these new options (separated to ease review).
As for documentation - I'm not sure what to add beyond the basic
option description. When should these be used?

A third small patch adds "env ---list-signal-actions" and
"env --list-blocked-signals" - to ease diagnostics.
Might be worth adding for completeness (e.g., for users who
need to somehow know if SIGPIPE is being ignored by the shell
or not):

$ ( trap '' PIPE && src/env --list-signal-actions )
PIPE   (13): ignore

Comments very welcomed,
 - assaf



>From 02cba657e2f63c05f859daf18a7d1032fdc32c6f Mon Sep 17 00:00:00 2001
From: Assaf Gordon 
Date: Fri, 15 Feb 2019 12:31:48 -0700
Subject: [PATCH 1/3] env: new options
 -p/--default-signal=SIG/--ignore-signal=SIG

New options to set signal handlers to default (SIG_DFL) or ignore
(SIG_IGN) This is useful to overcome POSIX limitation that shell must
not override inherited signal state, e.g. the second 'trap' here is
a no-op:

   trap '' PIPE && sh -c 'trap - PIPE ; seq inf | head -n1'

Instead use:

   trap '' PIPE && sh -c 'env -p seq inf | head -n1'

Similarly, the following will prevent CTRL-C from terminating the
program:

   env --ignore-signal=INT seq inf > /dev/null

See https://bugs.gnu.org/34488#8 .

* NEWS: Mention new options.
* doc/coreutils.texi (env invocation): Document new options.
* man/env.x: Add example of --default-signal=SIG usage.
* src/env.c (signals): New global variable.
(shortopts,longopts): Add new options.
(usage): Print new options.
(parse_signal_params): Parse comma-separated list of signals, store in
signals variable.
(reset_signal_handlers): Set each signal to SIG_DFL/SIG_IGN.
(main): Process new options.
* src/local.mk (src_env_SOURCES): Add operand2sig.c.
* tests/misc/env-signal-handler.sh: New test.
* tests/local.mk (all_tests): Add new test.
---
 NEWS |   3 +
 doc/coreutils.texi   |  58 
 man/env.x|  69 ++
 src/env.c| 138 +++-
 src/local.mk |   1 +
 tests/local.mk   |   1 +
 tests/misc/env-signal-handler.sh | 146 +++
 7 files changed, 415 insertions(+), 1 deletion(-)
 create mode 100755 tests/misc/env-signal-handler.sh

diff --git a/NEWS b/NEWS
index e73cb52b8..ddbbaf138 100644
--- a/NEWS
+++ b/NEWS
@@ -81,6 +81,9 @@ GNU coreutils NEWS-*- outline -*-
   test now supports the '-N FILE' unary operator (like e.g. bash) to check
   whether FILE exists and has been modified since it was last read.
 
+  env now supports '--default-singal[=SIG]' and '--ignore-signal[=SIG]'
+  options to set signal handlers before executing a program.
+
 ** New commands
 
   basenc is added to complement existing base64,base32 commands,
diff --git a/doc/coreutils.texi b/doc/coreutils.texi
index eb1848882..c2c202b28 100644
--- a/doc/coreutils.texi
+++ b/doc/coreutils.texi
@@ -17246,6 +17246,64 @@ chroot /chroot env --chdir=/srv true
 env --chdir=/build FOO=bar timeout 5 true
 @end example
 
+@item --default-signal[=@var{sig}]
+Reset signal @var{sig} to its default signal handler. Without @var{sig} all
+known signals are reset to their defaults. Multiple signals can be
+comma-separated. The following command runs @command{seq} with SIGINT and
+SIGPIPE set to their default (which is to terminate the program):
+
+@example
+env --default-signal=PIPE,INT seq 1000 | head -n1
+@end example
+
+In the following example:
+
+@example
+trap '' PIPE && sh -c 'trap - PIPE ; seq inf | head -n1'
+@end example
+
+The first trap command sets SIGPIPE to ignore.  The second trap command
+ostensibly sets it back to its  default, but POSIX mandates that the shell
+must not change inherited state of the signal - so it is a no-op.
+
+Using @option{--default-signal=PIPE} (or its shortcut @option{-p}) can be
+used to force the signal to  its  default 

bug#34488: Add sort --limit, or document workarounds for sort|head error messages

2019-02-25 Thread Eric Blake
On 2/23/19 11:32 PM, Pádraig Brady wrote:

 You HAVE to use some other intermediate program if you want to override
 an inherited ignored SIGPIPE in sh into an inherited default-behavior
 SIGPIPE in sort.
>>>
>>> Should we also propose to POSIX to allow trap to specify default?
>>
>> That's what "trap - PIPE" is already supposed to do, except that POSIX
>> has the odd requirement that a signal that was inherited ignored cannot
>> be reset to default.
>>
>>> Maybe `trap 0 PIPE` or similar?
>>
>> Alas, bash has already defined that to mean the same as 'trap - EXIT PIPE'.
> 
> Fair enough, but do we agree that it would be good
> to have functionality in the shell with some similar syntax
> that resets the handler to system default?

Worth asking on the bash list to see if Chet has any interest in such an
extension (POSIX is reluctant to specify something that doesn't have
existing implementation practice).

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org





bug#34608: date +%+4C is unimplemented, contrary to POSIX

2019-02-25 Thread Paul Eggert

Eric Blake wrote:


The best I can do is search austingroupbugs.net for all bugs containing
strftime, which pulls up a current total of 25 issues.


Thanks for doing that. I read through those issues, and the only one that seems 
relevant (i.e., resolved by austingroup but not in Gnulib) is the business with 
the + conversion specification flag. I installed this Gnulib patch to implement 
the flag:


https://git.savannah.gnu.org/cgit/gnulib.git/commit/?id=188d87b05190690d6f8b0577ec65ef221a711d08

Then I installed the attached coreutils patches to propagate the Gnulib patch 
into coreutils, and I am marking this coreutils bug as done.


glibc strftime should also support the + conversion specification flag. I filed 
a bug report for that here:


https://sourceware.org/bugzilla/show_bug.cgi?id=24264
>From c50c4ab9e16df9368d52ed8cada3f3f2e32f093a Mon Sep 17 00:00:00 2001
From: Paul Eggert 
Date: Sun, 24 Feb 2019 23:35:23 -0800
Subject: [PATCH 1/3] build: update gnulib submodule to latest

---
 gnulib | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gnulib b/gnulib
index e3970fb98..188d87b05 16
--- a/gnulib
+++ b/gnulib
@@ -1 +1 @@
-Subproject commit e3970fb9891668bd9dbc94daca18dc0d42b7e466
+Subproject commit 188d87b05190690d6f8b0577ec65ef221a711d08
-- 
2.17.1

>From b1e7af28ebc4b7218b2a76eee1dca738d3224a63 Mon Sep 17 00:00:00 2001
From: Paul Eggert 
Date: Sun, 24 Feb 2019 23:59:22 -0800
Subject: [PATCH 2/3] =?UTF-8?q?date:=20=E2=80=98+=E2=80=99=20conversion=20?=
 =?UTF-8?q?specification=20flag?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The recent Gnulib update fixed Bug#34608; document and test this.
* NEWS: Mention the change.
* doc/coreutils.texi (Padding and other flags):
Update doc to cover new flag and other POSIX.1-2017 changes.
* tests/misc/date.pl (date-century-plus): New test.
---
 NEWS   |  3 +++
 doc/coreutils.texi | 15 ---
 tests/misc/date.pl |  3 +++
 3 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/NEWS b/NEWS
index 5d9956854..e73cb52b8 100644
--- a/NEWS
+++ b/NEWS
@@ -11,6 +11,9 @@ GNU coreutils NEWS-*- outline -*-
   after asking the user whether to proceed.
   [This bug was present in "the beginning".]
 
+  'date' now supports the '+' conversion specification flag,
+  introduced in POSIX.1-2017.
+
   df no longer corrupts displayed multibyte characters on macOS.
   [bug introduced with coreutils-8.18]
 
diff --git a/doc/coreutils.texi b/doc/coreutils.texi
index 028371673..d1427323c 100644
--- a/doc/coreutils.texi
+++ b/doc/coreutils.texi
@@ -15985,24 +15985,33 @@ example, numeric months are always output as two digits.
 Seconds since the epoch are not padded, though,
 since there is no natural width for them.
 
-As a GNU extension, @command{date} recognizes any of the
-following optional flags after the @samp{%}:
+The following optional flags can appear after the @samp{%}:
 
 @table @samp
 @item -
 (hyphen) Do not pad the field; useful if the output is intended for
 human consumption.
+This is a GNU extension.
 @item _
 (underscore) Pad with spaces; useful if you need a fixed
 number of characters in the output, but zeros are too distracting.
+This is a GNU extension.
 @item 0
 (zero) Pad with zeros even if the conversion specifier
 would normally pad with spaces.
+@item +
+Pad with zeros, like @samp{0}.  In addition, precede any year number
+with @samp{+} if it exceeds  or if its field width exceeds 4;
+similarly, precede any century number with @samp{+} if it exceeds 99
+or if its field width exceeds 2.  Preceding with @samp{+} is helpful
+for generationg some ISO 8601 formats.
 @item ^
 Use upper case characters if possible.
+This is a GNU extension.
 @item #
 Use opposite case characters if possible.
 A field that is normally upper case becomes lower case, and vice versa.
+This is a GNU extension.
 @end table
 
 @noindent
@@ -16017,7 +16026,7 @@ date +%_d/%_m -d "Feb 1"
 @result{}  1/ 2
 @end example
 
-As a GNU extension, you can specify the field width
+You can optionally specify the field width
 (after any flag, if present) as a decimal number.  If the natural size of the
 output of the field has less than the specified number of characters,
 the result is written right adjusted and padded to the given
diff --git a/tests/misc/date.pl b/tests/misc/date.pl
index 5e12158e9..9ba3d3983 100755
--- a/tests/misc/date.pl
+++ b/tests/misc/date.pl
@@ -297,6 +297,9 @@ my @Tests =
   {ENV => 'TZ=PST8'},
   {OUT => 'Wed Dec 31 21:00:00 PST 1969'},
  ],
+
+ # https://bugs.gnu.org/34608
+ ['date-century-plus', '-d @0 +.%+4C.', {OUT => '.+019.'}],
 );
 
 # Repeat the cross-dst test, using Jan 1, 2005 and every interval from 1..364.
-- 
2.17.1

>From 3e7fd6650e6040c6b09f97d6f189e0365727df90 Mon Sep 17 00:00:00 2001
From: Paul Eggert 
Date: Mon, 25 Feb 2019 00:19:22 -0800
Subject: [PATCH 3/3] doc: fix typo in previous patch

---