bug#42764: csplit does not suppress the last match when not using {*}

2020-08-08 Thread Pádraig Brady

On 08/08/2020 10:12, Emanuele Giacomelli via GNU coreutils Bug Reports wrote:

Good day,

I am experiencing an odd behaviour in csplit which may actually be a
bug.

I am testing this against the code cloned from
https://github.com/coreutils/coreutils.git, on the commit described by
git as v8.32-52-gc0e5f8c59.

Suppose I have the following YAML file:

==> test.yaml <==
value1: 123
---
value2: 456
---
value3: 789

and I want to split it at '---' lines. First I would try the following:

     csplit -z --suppress-matched test.yaml '/^---$/' '{1}'

which outputs:

     12
     12
     16

and creates the following files:

     ==> xx00 <==
     value1: 123

     ==> xx01 <==
     value2: 456

     ==> xx02 <==
     ---
     value3: 789

The last portion still contains the '---', despite it being suppressed
from the second part.

Now, if I try again with:

     csplit -z --suppress-matched test.yaml '/^---$/' '{*}'

I get:

     12
     12
     12

and:

     ==> xx00 <==
     value1: 123

     ==> xx01 <==
     value2: 456

     ==> xx02 <==
     value3: 789

where the last part does not contain the matched line, as expected.

While trying to figure out the problem, I noticed that match suppression
is done at the beginning of process_regexp. For a match-twice scenario
like the first one, the function is called twice, then the rest of the
file is simply dumped by split_file.

This means that the two calls to process_regexp will:

* suppress nothing for call #1 because nothing has been matched yet;
* suppress the first match in call #2.

Then, the rest of the file is dumped but no one actually suppressed the
second match, which appears in the last segment. When using asterisk
repetition, the file is instead dumped by process_regexp, which gets its
chance to suppress the matched line.

I came up with the attached patch, which simply moves match suppression
at the end of process_regexp. With this modification, the invocation:

     csplit -z --suppress-matched test.yaml '/^---$/' '{1}'

now produces:

     12
     12
     12

and:

==> xx00 <==
value1: 123

==> xx01 <==
value2: 456

==> xx02 <==
value3: 789

which is what I would expect.




I agree with this analysis.
The usual manifestation would probably be
when there was only a single match.
I.E. when not specifying a repetition count,
we were not suppressing the single match.

I'll apply the attached in your name later today
(which also adds a test).

Marking this as done.

thanks!
Pádraig
>From 7cf45f4f6a093a927d3139c87f52999dd2c750ec Mon Sep 17 00:00:00 2001
From: Emanuele Giacomelli 
Date: Sat, 8 Aug 2020 21:29:13 +0100
Subject: [PATCH] csplit: fix regex suppression with specific match count

* src/csplit.c (process_regexp): Process the line suppression
in all invocations so that the last match is suppressed.
Previously with a non infinite match count,
the last regex pattern was not suppressed.
* NEWS: Mention the bug fix.
* tests/misc/csplit-suppress-matched.pl: Add a test case.
Fixes https://bugs.gnu.org/42764
---
 NEWS  |  4 
 src/csplit.c  |  6 +++---
 tests/misc/csplit-suppress-matched.pl | 12 +---
 3 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/NEWS b/NEWS
index 1881de115..61b711611 100644
--- a/NEWS
+++ b/NEWS
@@ -9,6 +9,10 @@ GNU coreutils NEWS-*- outline -*-
   is a non regular file.
   [bug introduced in coreutils-8.6]
 
+  csplit --suppress-matched now elides the last matched line
+  when a specific number of pattern matches are performed.
+  [bug introduced with the --suppress-matched feature in coreutils-8.22]
+
   du no longer crashes on XFS file systems when the directory hierarchy is
   heavily changed during the run.
   [bug introduced in coreutils-8.25]
diff --git a/src/csplit.c b/src/csplit.c
index 9bd9c43b5..93ff60dc6 100644
--- a/src/csplit.c
+++ b/src/csplit.c
@@ -803,9 +803,6 @@ process_regexp (struct control *p, uintmax_t repetition)
   if (!ignore)
 create_output_file ();
 
-  if (suppress_matched && current_line > 0)
-remove_line ();
-
   /* If there is no offset for the regular expression, or
  it is positive, then it is not necessary to buffer the lines. */
 
@@ -893,6 +890,9 @@ process_regexp (struct control *p, uintmax_t repetition)
 
   if (p->offset > 0)
 current_line = break_line;
+
+  if (suppress_matched)
+remove_line ();
 }
 
 /* Split the input file according to the control records we have built. */
diff --git a/tests/misc/csplit-suppress-matched.pl b/tests/misc/csplit-suppress-matched.pl
index 80f5299d0..e15ebb0f2 100755
--- a/tests/misc/csplit-suppress-matched.pl
+++ b/tests/misc/csplit-suppress-matched.pl
@@ -67,21 +67,27 @@ my @csplit_tests =
 {OUTPUTS => [ "a\na\nYY\n", "\nXX\nb\nb\nYY\n","\nXX\nc\nYY\n",
   "\nXX\nd\nd\nd\n" ] }],
 
-  # the newline (matched line) does not appears in the output files
+  # the newline (matched line) does not appear in the 

bug#42766: file names with spaces are quoted in the output from ls

2020-08-08 Thread David Thomas
Hello,

I noticed the other day when running ls in a terminal that some file names with 
spaces were quoted. I was quite confused, as I was sure they hadn’t been saved 
with quotes at the beginning and end of the file name. I’m not someone who 
often uses the terminal, but I know enough of what I was expecting (that file 
names created without quotes shouldn’t have quotes around them) that it threw 
me way off. Some of what were quoted were directories, and at first I was 
typing out the quotes to cd into them. Then I discovered it still worked to cd 
into them without typing the quotes, which was at least better. But I really 
don’t want those quotes. They look weird, and they make me feel like I mistyped 
when I created the file or directory.

I did some more digging, and discovered that I can alias ls to ls -N to get rid 
of them, but I sure don’t want to have to make this new alias for every system 
I use from here on out.

I have already read every bit of 
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=813164#226 already and the 
stubbornness of the devs is surprising and disheartening.

Isn’t open source a friendly place? If most people think things are a bad idea, 
why do them? This isn’t something that helps new users—it’s just going to 
confuse them, like it confused me.

Please do us all a favor and revert the change. Thanks so much,

-David


bug#42764: csplit does not suppress the last match when not using {*}

2020-08-08 Thread Emanuele Giacomelli via GNU coreutils Bug Reports
Good day,

I am experiencing an odd behaviour in csplit which may actually be a
bug.

I am testing this against the code cloned from
https://github.com/coreutils/coreutils.git, on the commit described by
git as v8.32-52-gc0e5f8c59.

Suppose I have the following YAML file:

==> test.yaml <==
value1: 123
---
value2: 456
---
value3: 789

and I want to split it at '---' lines. First I would try the following:

    csplit -z --suppress-matched test.yaml '/^---$/' '{1}'

which outputs:

    12
    12
    16

and creates the following files:

    ==> xx00 <==
    value1: 123

    ==> xx01 <==
    value2: 456

    ==> xx02 <==
    ---
    value3: 789

The last portion still contains the '---', despite it being suppressed
from the second part.

Now, if I try again with:

    csplit -z --suppress-matched test.yaml '/^---$/' '{*}'

I get:

    12
    12
    12

and:

    ==> xx00 <==
    value1: 123

    ==> xx01 <==
    value2: 456

    ==> xx02 <==
    value3: 789

where the last part does not contain the matched line, as expected.

While trying to figure out the problem, I noticed that match suppression
is done at the beginning of process_regexp. For a match-twice scenario
like the first one, the function is called twice, then the rest of the
file is simply dumped by split_file.

This means that the two calls to process_regexp will:

* suppress nothing for call #1 because nothing has been matched yet;
* suppress the first match in call #2.

Then, the rest of the file is dumped but no one actually suppressed the
second match, which appears in the last segment. When using asterisk
repetition, the file is instead dumped by process_regexp, which gets its
chance to suppress the matched line.

I came up with the attached patch, which simply moves match suppression
at the end of process_regexp. With this modification, the invocation:

    csplit -z --suppress-matched test.yaml '/^---$/' '{1}'

now produces:

    12
    12
    12

and:

==> xx00 <==
value1: 123

==> xx01 <==
value2: 456

==> xx02 <==
value3: 789

which is what I would expect.

diff --git a/src/csplit.c b/src/csplit.c
index 9bd9c43b5..93ff60dc6 100644
--- a/src/csplit.c
+++ b/src/csplit.c
@@ -803,9 +803,6 @@ process_regexp (struct control *p, uintmax_t repetition)
   if (!ignore)
 create_output_file ();
 
-  if (suppress_matched && current_line > 0)
-remove_line ();
-
   /* If there is no offset for the regular expression, or
  it is positive, then it is not necessary to buffer the lines. */
 
@@ -893,6 +890,9 @@ process_regexp (struct control *p, uintmax_t repetition)
 
   if (p->offset > 0)
 current_line = break_line;
+
+  if (suppress_matched)
+remove_line ();
 }
 
 /* Split the input file according to the control records we have built. */