Re: [hackers] [PATCH][sbase] paste: Allow null delim

2020-03-05 Thread Quentin Rameau
> This has nothing to do with getopt(). This is about how the shell parses
> and splits words into arguments. -d""

Erf of course, put that on the fever. ^^



Re: [hackers] [PATCH][sbase] paste: Allow null delim

2020-03-05 Thread Evan Gates
On Thu, Mar 5, 2020, 02:16 Quentin Rameau  wrote:
>
> On Thu, 5 Mar 2020 01:42:37 -0800
> Michael Forney  wrote:
>
> > >> and `-d""` is invalid, since paste(1) must follow the
> > >> utility syntax guidelines (guideline 7).
> > >
> > > Not sure what you mean there, -d"" is the concatenation of -d and
> > > '', which is standard.
> > > Did you quote the correct guideline? “Guideline 7: Option-arguments
> > > should not be optional.” here there's an option-argument, that's an
> > > empty string.
> >
> > I guess it's a combination of 6 and 7. Option arguments must be
> > separate parameters and non-optional (unless specified otherwise).
> > There is an exception made that allows conforming implementations to
> > accept an option adjacent with its option argument in the same
> > argument, but I think this only makes sense if the option-argument is
> > non-empty.
> >
> > POSIX says
> >
> >   The construct '\0' is used to mean "no separator" because historical
> > versions of paste did not follow the syntax guidelines, and the
> > command:
> >
> >   paste -d"" ...
> >
> >   could not be handled properly by getopt().
> >
> > I think this implies that `paste -d""` is no longer valid, due to the
> > requirements of the syntax guidelines.
>
> Rather it's due to a getopt() limitation with the (not so) special case
> of an empty string option-argument, but I'm not sure why.

This has nothing to do with getopt(). This is about how the shell parses
and splits words into arguments. -d"" is exactly the same as -d. It is
the same word, adding an empty string to the end of a string does not
change that original string. There is no argument to -d there. If you
wanted to -d"" to pass two arguments "-d" and "" then you'd have to
write a new shell to do that. As it stands shells pass on "-d" in argv
when they come across -d"".



Re: [hackers] [PATCH][sbase] paste: Allow null delim

2020-03-05 Thread Quentin Rameau
On Thu, 5 Mar 2020 01:42:37 -0800
Michael Forney  wrote:

> On 2020-03-05, Quentin Rameau  wrote:
> >> Looking at POSIX, I see that `-d '\0'` must be supported, `-d ""`
> >> is unspecified  
> >
> > I don't think so, -d "" is just a list with an empty string.
> > So -d '\0' is equivalent to -d '', '\0' is here to let the user
> > express an empty string in a list, which wouldn't be possible
> > otherwise (like how would one specify empty string in a list like
> > 'abcd').  
> 
> Here is a direct quote from POSIX:
> 
> The commands:
> 
> paste -d "\0" ...
> paste -d "" ...
> 
> are not necessarily equivalent; the latter is not specified by
> this volume of POSIX.1-2017 and may result in an error.

My bad, I didn't read that far.

But,

> >> and `-d""` is invalid, since paste(1) must follow the
> >> utility syntax guidelines (guideline 7).  
> >
> > Not sure what you mean there, -d"" is the concatenation of -d and
> > '', which is standard.
> > Did you quote the correct guideline? “Guideline 7: Option-arguments
> > should not be optional.” here there's an option-argument, that's an
> > empty string.  
> 
> I guess it's a combination of 6 and 7. Option arguments must be
> separate parameters and non-optional (unless specified otherwise).
> There is an exception made that allows conforming implementations to
> accept an option adjacent with its option argument in the same
> argument, but I think this only makes sense if the option-argument is
> non-empty.
> 
> POSIX says
> 
>   The construct '\0' is used to mean "no separator" because historical
> versions of paste did not follow the syntax guidelines, and the
> command:
> 
>   paste -d"" ...
> 
>   could not be handled properly by getopt().
> 
> I think this implies that `paste -d""` is no longer valid, due to the
> requirements of the syntax guidelines.

Rather it's due to a getopt() limitation with the (not so) special case
of an empty string option-argument, but I'm not sure why.

The utility guidelines permits concatenating options and
option-arguments, while it's true that the prefered syntax is to
separate them.

12.1, 2.a:
“If the SYNOPSIS of a standard utility shows an option with a mandatory
option-argument (as with [ -c option_argument] in the example), a
conforming application shall use separate arguments for that option and
its option-argument. However, a conforming implementation shall also
permit applications to specify the option and option-argument in the
same argument string without intervening  characters.”

Without commenting on the patch itself, I think it's fine to allow '\0'
and '', especially as we don't use getopt(), partly due to it being
broken in most implementations (and having such limitations?), and the
standards says it's unspecified, not undefinied.



Re: [hackers] [PATCH][sbase] paste: Allow null delim

2020-03-05 Thread Michael Forney
On 2020-03-05, Quentin Rameau  wrote:
>> Looking at POSIX, I see that `-d '\0'` must be supported, `-d ""` is
>> unspecified
>
> I don't think so, -d "" is just a list with an empty string.
> So -d '\0' is equivalent to -d '', '\0' is here to let the user express
> an empty string in a list, which wouldn't be possible otherwise (like
> how would one specify empty string in a list like 'abcd').

Here is a direct quote from POSIX:

The commands:

paste -d "\0" ...
paste -d "" ...

are not necessarily equivalent; the latter is not specified by
this volume of POSIX.1-2017 and may result in an error.

>> and `-d""` is invalid, since paste(1) must follow the
>> utility syntax guidelines (guideline 7).
>
> Not sure what you mean there, -d"" is the concatenation of -d and '',
> which is standard.
> Did you quote the correct guideline? “Guideline 7: Option-arguments
> should not be optional.” here there's an option-argument, that's an
> empty string.

I guess it's a combination of 6 and 7. Option arguments must be
separate parameters and non-optional (unless specified otherwise).
There is an exception made that allows conforming implementations to
accept an option adjacent with its option argument in the same
argument, but I think this only makes sense if the option-argument is
non-empty.

POSIX says

  The construct '\0' is used to mean "no separator" because historical
versions of paste did not follow the syntax guidelines, and the
command:

  paste -d"" ...

  could not be handled properly by getopt().

I think this implies that `paste -d""` is no longer valid, due to the
requirements of the syntax guidelines.



Re: [hackers] [PATCH][sbase] paste: Allow null delim

2020-03-05 Thread Quentin Rameau
Hello all,

> Looking at POSIX, I see that `-d '\0'` must be supported, `-d ""` is
> unspecified

I don't think so, -d "" is just a list with an empty string.
So -d '\0' is equivalent to -d '', '\0' is here to let the user express
an empty string in a list, which wouldn't be possible otherwise (like
how would one specify empty string in a list like 'abcd').

> and `-d""` is invalid, since paste(1) must follow the
> utility syntax guidelines (guideline 7).

Not sure what you mean there, -d"" is the concatenation of -d and '',
which is standard.
Did you quote the correct guideline? “Guideline 7: Option-arguments
should not be optional.” here there's an option-argument, that's an
empty string.



Re: [hackers] [PATCH][sbase] paste: Allow null delim

2020-03-05 Thread Michael Forney
Hi Richard,

Thanks for the patch. Can you clarify if this change fixes `-d "\0"`,
`-d ""` or `-d""`?

Looking at POSIX, I see that `-d '\0'` must be supported, `-d ""` is
unspecified, and `-d""` is invalid, since paste(1) must follow the
utility syntax guidelines (guideline 7).

I recently investigated a similar issue to this in tr(1). I think a
proper solution would be to add length parameters to utflen and
utftorunestr so that they can handle 0-bytes in the strings.

On 2020-03-02, Richard Ipsum  wrote:
> ---
>  paste.c | 14 ++
>  1 file changed, 10 insertions(+), 4 deletions(-)
>
> diff --git a/paste.c b/paste.c
> index b0ac761..0159fe0 100644
> --- a/paste.c
> +++ b/paste.c
> @@ -53,7 +53,8 @@ nextline:
>
>   for (; efgetrune(, dsc[i].fp, dsc[i].name) ;) {
>   for (m = last + 1; m < i; m++)
> - efputrune(&(delim[m % delimlen]), stdout, 
> "");
> + if (delim[m % delimlen] != '\0')
> + efputrune(&(delim[m % delimlen]), 
> stdout, "");
>   last = i;
>   if (c == '\n') {
>   if (i != fdescrlen - 1)
> @@ -68,7 +69,8 @@ nextline:
>   if (i == fdescrlen - 1)
>   putchar('\n');
>   else
> - efputrune(, stdout, "");
> + if (d != '\0')
> + efputrune(, stdout, "");
>   last++;
>   }
>   }
> @@ -96,7 +98,7 @@ main(int argc, char *argv[])
>   seq = 1;
>   break;
>   case 'd':
> - adelim = EARGF(usage());
> + adelim = ARGF();

I think allowing missing option-argument here breaks POSIX compatibility.

>   unescape(adelim);
>   break;
>   default:
> @@ -109,8 +111,12 @@ main(int argc, char *argv[])
>   /* populate delimiters */
>   /* TODO: fix libutf to accept sizes */
>   delim = ereallocarray(NULL, utflen(adelim) + 1, sizeof(*delim));
> - if (!(delimlen = utftorunestr(adelim, delim)))
> + if (*adelim == '\0') {
> + delimlen = 1;
> + *delim = '\0';
> + } else if (!(delimlen = utftorunestr(adelim, delim))) {
>   usage();
> + }
>
>   /* populate file list */
>   dsc = ereallocarray(NULL, argc, sizeof(*dsc));
> --
> 2.25.1



[hackers] [PATCH][sbase] paste: Allow null delim

2020-03-02 Thread Richard Ipsum
---
 paste.c | 14 ++
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/paste.c b/paste.c
index b0ac761..0159fe0 100644
--- a/paste.c
+++ b/paste.c
@@ -53,7 +53,8 @@ nextline:
 
for (; efgetrune(, dsc[i].fp, dsc[i].name) ;) {
for (m = last + 1; m < i; m++)
-   efputrune(&(delim[m % delimlen]), stdout, 
"");
+   if (delim[m % delimlen] != '\0')
+   efputrune(&(delim[m % delimlen]), 
stdout, "");
last = i;
if (c == '\n') {
if (i != fdescrlen - 1)
@@ -68,7 +69,8 @@ nextline:
if (i == fdescrlen - 1)
putchar('\n');
else
-   efputrune(, stdout, "");
+   if (d != '\0')
+   efputrune(, stdout, "");
last++;
}
}
@@ -96,7 +98,7 @@ main(int argc, char *argv[])
seq = 1;
break;
case 'd':
-   adelim = EARGF(usage());
+   adelim = ARGF();
unescape(adelim);
break;
default:
@@ -109,8 +111,12 @@ main(int argc, char *argv[])
/* populate delimiters */
/* TODO: fix libutf to accept sizes */
delim = ereallocarray(NULL, utflen(adelim) + 1, sizeof(*delim));
-   if (!(delimlen = utftorunestr(adelim, delim)))
+   if (*adelim == '\0') {
+   delimlen = 1;
+   *delim = '\0';
+   } else if (!(delimlen = utftorunestr(adelim, delim))) {
usage();
+   }
 
/* populate file list */
dsc = ereallocarray(NULL, argc, sizeof(*dsc));
-- 
2.25.1