Re: [hackers] [sbase][PATCH] Support -- in all utilities except echo(1)
On Mon, 1 Jul 2019 22:30:43 -0700 Michael Forney wrote: Dear Michael, > Thanks for investigating this, Laslo. Your findings seem to match my > quick experimentation, and at this point I am leaning towards sticking > with ARGBEGIN/ARGEND, but using a simpler implementation like the one > I shared or the one you are using for farbfeld. It's good to know that > we could wrap getopt(3) if necessary, but I had forgotten about > POSIXLY_CORRECT. I think that alone might be reason enough to avoid > it. I think the argv permutation glibc getopt does by default would be > a serious regression. glibc is such a bloody piece of garbage! It is an outright insult that they are deliberately POSIX-non-compliant and only offer to set some crude ugly environment variable (what could go wrong?) to fix it. Their arrogance and lethargy is legendary and the only reason anybody is still using the glibc and other GNU abominations is due to their insistence on extending all standard interfaces, including the kitchen sink, with their own proprietary stuff (yes, Mr. Stallman, glibc is a properitary mess). When a developer is not extra-careful they end up using these extensions. Switching the glibc means, especially for big and old projects, rewriting significant parts of the code. It's even more of a shame that the glibc assumes it's "the only true" libc, so much that musl has to imitate it in many ways. If I were a musl developer I'd print a #warning every time an #ifdef is triggered that was placed deliberately to babysit GNU-reliant code, so people finally stop using GNUisms and we can stop this viscious cycle. > Additionally, I've been thinking about the argv = { NULL } case. > Currently argv0 will get set to NULL, but there seems to be a lot of > places where it is passed to printf without checking (even though most > implementations will print "(null)"). I'm thinking we should probably > provide a fallback argv0 in this case. Perhaps we could pass it as a > parameter to ARGBEGIN, like > > ARGBEGIN("rm") { > ... > } ARGEND > > Or maybe we could just introduce an idiom always used as the first > statement in main() like > > argv0 = argc ? argv[0] : "rm"; > > rather than implicitly setting it ARGBEGIN. I would keep it simple and add a small check in ARGBEGIN to set argv0 either to argv[0] or "(null)" if it is NULL. Sure, printf may handle NULL properly, but it's an error in OpenBSD and we should avoid passing NULLs to it. However, what we should never do is extend the code and interfaces for crude edge cases. With best regards Laslo -- Laslo Hunhold pgp6WDoV37ywG.pgp Description: PGP signature
Re: [hackers] [sbase][PATCH] Support -- in all utilities except echo(1)
On 2019-06-30, Laslo Hunhold wrote: > What do the others think? Thanks for investigating this, Laslo. Your findings seem to match my quick experimentation, and at this point I am leaning towards sticking with ARGBEGIN/ARGEND, but using a simpler implementation like the one I shared or the one you are using for farbfeld. It's good to know that we could wrap getopt(3) if necessary, but I had forgotten about POSIXLY_CORRECT. I think that alone might be reason enough to avoid it. I think the argv permutation glibc getopt does by default would be a serious regression. Additionally, I've been thinking about the argv = { NULL } case. Currently argv0 will get set to NULL, but there seems to be a lot of places where it is passed to printf without checking (even though most implementations will print "(null)"). I'm thinking we should probably provide a fallback argv0 in this case. Perhaps we could pass it as a parameter to ARGBEGIN, like ARGBEGIN("rm") { ... } ARGEND Or maybe we could just introduce an idiom always used as the first statement in main() like argv0 = argc ? argv[0] : "rm"; rather than implicitly setting it ARGBEGIN.
Re: [hackers] [sbase][PATCH] Support -- in all utilities except echo(1)
On Sun, 30 Jun 2019 21:20:43 -0700 Michael Forney wrote: Dear Michael, > I'm okay with switching to getopt(3), but also note that the current > arg.h is probably more complicated than it needs to be. Here's a > version I rewrote that I've been using in my own projects: > > https://git.sr.ht/~mcf/samurai/blob/master/arg.h > > I agree that getopt(3) would probably be better at handling the corner > cases. Yesterday I was planning to check that tools behaved correctly > with argv = { NULL }. The first thing I tried was rm(1), which tried > to remove files corresponding to my environment (i.e. > "SHELL=/bin/ksh"). Yikes! > > I am also not sure how getopt(3) could be used to handle the tricky > cases I mentioned, like printf(1). It doesn't have any options, but > still needs to support `--` and treat arguments that start with `-` as > operands rather than printing a usage message. as an addendum, my own research has yielded some points against getopt. - before returning '?' on an invalid option, it prints a bloody error message. You will have to set the global variable "opterr" to 0 before calling getopt to prevent stupid error messages from flooding stderr. - The behaviour of returning ':' or '?' for a missing argument is a bit convoluted. Read the Posix-page to know what I mean, but in short it only returns ':' at all if the first character of the optstring is ':'. So we can get rid of the distinction and simplify the switch a bit; see below. - getopt(3) is not thread-safe, as it is not reentrant and difficult to reset, as this is undefined behaviour as modifying the global state is undefined behaviour. - One aspect I totally missed is that argc won't be modified either. Instead, getopt() sets a global variable "optind" corresponding to the first non-option argument. So the port I made in the previous mail would actually have to look like below. int c; opterr = 0; while ((c = getopt(argc, argv, "abc:")) != -1) { switch (c) { case 'a': aflag = 1; break; case 'b': bflag = 1; break; case 'c': cflag = 1; carg = optarg; break; default: usage(); } } argv += optind; argc -= optind; This should 1:1 correspond to the arg.h-usage ARGBEGIN { case 'a': aflag = 1; break; case 'b': bflag = 1; break; case 'c': cflag = 1; carg = EARGF(usage()); default: usage(); } ARGEND What we can really notice is that due to the amalgamation of the cases '?' and ':' into "default" the switches are nearly identical, modulo the uses of EARGF(). It would thus be possible to redefine the ARGBEGIN- and ARGEND-macros in terms of getopt(3). To "contain" the local variable 'c', instead of a block, I would wrap it all into a do {int c; ...} while (false) so c has a local context and the macro is relatively safe. ARGEND would then just be a } while (false) What do the others think? With best regards Laslo -- Laslo Hunhold pgpr8yLxz6E_W.pgp Description: PGP signature
Re: [hackers] [sbase][PATCH] Support -- in all utilities except echo(1)
On Sun, 30 Jun 2019 21:20:43 -0700 Michael Forney wrote: Dear Michael, > I'm okay with switching to getopt(3), but also note that the current > arg.h is probably more complicated than it needs to be. Here's a > version I rewrote that I've been using in my own projects: > > https://git.sr.ht/~mcf/samurai/blob/master/arg.h > > I agree that getopt(3) would probably be better at handling the corner > cases. Yesterday I was planning to check that tools behaved correctly > with argv = { NULL }. The first thing I tried was rm(1), which tried > to remove files corresponding to my environment (i.e. > "SHELL=/bin/ksh"). Yikes! > > I am also not sure how getopt(3) could be used to handle the tricky > cases I mentioned, like printf(1). It doesn't have any options, but > still needs to support `--` and treat arguments that start with `-` as > operands rather than printing a usage message. I fixed the bugs in arg.h a few years ago and use the "fixed version" in farbfeld and quark (see [0]). Besides the NULL-bug it also allows using ARGF() more than once in a case and other things. See the commit[1] for more info. Glibc requires the environment variable "POSIXLY_CORRECT" to be set to be fully Posix[2] conformant, but that's just the typical glibc-insanity. Let's see how a port would look like. A one-to-one mapping from arg.h to getopt() might be for instance as follows: ARGBEGIN { case 'a': aflag = 1; break; case 'b': bflag = 1; break; case 'c': cflag = 1; carg = EARGF(usage()); default: usage(); } ARGEND Would be transformed to int c; while ((c = getopt(argc, argv, "abc:")) != -1) { switch (c) { case 'a': aflag = 1; break; case 'b': bflag = 1; break; case 'c': cflag = 1; carg = optarg; break; case ':': case '?': usage(); } } Which of the solutions wins this in regard to readability is clear, but the difference is not too big. The thought of using getopt(3) has crossed my mind more than once in the past few years. What do the others think? Is there any big reason speaking against getopt(3)? With best regards Laslo [0]:https://git.suckless.org/farbfeld/file/arg.h.html [1]:https://git.suckless.org/farbfeld/commit/31651271e1afd99983fb3d0ec51a273e31aaf4e9.html [2]:https://pubs.opengroup.org/onlinepubs/9699919799/functions/getopt.html -- Laslo Hunhold pgpKSPs5bnqtn.pgp Description: PGP signature
Re: [hackers] [sbase][PATCH] Support -- in all utilities except echo(1)
On 2019-06-30, Evan Gates wrote: > On Fri, Jun 28, 2019 at 3:08 AM Michael Forney wrote: >> >> Then, maybe something like the following would work for those cases. >> >> ARGBEGIN { >> default: >> goto done; >> } ARGEND >> done: > > Seeing as we have more confusion and bugs to deal with in argument > handling, perhaps now is a time to revist the conversation surrounding > getopt(3p)? There is a POSIX mandated libc function that parses POSIX > options following the utility syntax guidelines. I strongly suggest > using it. I know there has been pushback in the past but it's a > portable option that solves our exact problems. This is why it exists. > It's already part of libc. We should stop avoiding it in favor of a > difficult to read block of macros that have trouble dealing with corner > cases. Removing arg.h simplifies sbase code and maintenance and helps > break the cycle of NIH that we often fall into. I'm okay with switching to getopt(3), but also note that the current arg.h is probably more complicated than it needs to be. Here's a version I rewrote that I've been using in my own projects: https://git.sr.ht/~mcf/samurai/blob/master/arg.h I agree that getopt(3) would probably be better at handling the corner cases. Yesterday I was planning to check that tools behaved correctly with argv = { NULL }. The first thing I tried was rm(1), which tried to remove files corresponding to my environment (i.e. "SHELL=/bin/ksh"). Yikes! I am also not sure how getopt(3) could be used to handle the tricky cases I mentioned, like printf(1). It doesn't have any options, but still needs to support `--` and treat arguments that start with `-` as operands rather than printing a usage message.
Re: [hackers] [sbase][PATCH] Support -- in all utilities except echo(1)
On Fri, Jun 28, 2019 at 3:08 AM Michael Forney wrote: > > Then, maybe something like the following would work for those cases. > > ARGBEGIN { > default: > goto done; > } ARGEND > done: Seeing as we have more confusion and bugs to deal with in argument handling, perhaps now is a time to revist the conversation surrounding getopt(3p)? There is a POSIX mandated libc function that parses POSIX options following the utility syntax guidelines. I strongly suggest using it. I know there has been pushback in the past but it's a portable option that solves our exact problems. This is why it exists. It's already part of libc. We should stop avoiding it in favor of a difficult to read block of macros that have trouble dealing with corner cases. Removing arg.h simplifies sbase code and maintenance and helps break the cycle of NIH that we often fall into.
Re: [hackers] [sbase][PATCH] Support -- in all utilities except echo(1)
On 2019-06-28, Laslo Hunhold wrote: >> Rather than add a special ENOFLAGS macro, I am tempted to revert >> 9016d288 instead. The question that remains is how to handle arguments >> that look like options (start with `-`) for tools that don't have any >> specified options. It seems for some tools it is important to treat >> them as operands (e.g. printf, expr), but for others that are commonly >> extended with additional options, it might make sense to fail with >> usage() to catch unsupported uses (e.g. hostname, tsort, chroot). > > thanks for elaborating about it more! This topic was a big point of > debate back then. In this matter Posix should have the final word. So > in this sense, if Posix mandates for e.g. dirname(1) to support -- then > we should not ignore -- and the user will have to write > > $ dirname -- -- > > I think though that we all agree that an ENOFLAGS-macro would be > insanity. I don't necessarily think it would be insanity, and actually prefer it to the one-liner used currently. Something like it might even be necessary to handle `--` without consuming any option arguments (for the printf and expr case mentioned above) unless we change the ARGBEGIN macro to not advance the argv strings. Then, maybe something like the following would work for those cases. ARGBEGIN { default: goto done; } ARGEND done: Though, this isn't particularly nice either...
Re: [hackers] [sbase][PATCH] Support -- in all utilities except echo(1)
On 2019-06-28, Michael Forney wrote: > As far as I know, unless the documentation states that a utility shall > conform to the Utility Syntax Guidelines, it is not required to > support `--`. However, it does use the language "should" which > means[2]: > > For an implementation that conforms to POSIX.1-2017, describes a > feature or behavior that is recommended but not mandatory. I found a section in POSIX that is a bit more explicit about this, and it looks like handling `--` is indeed required. See the OPTIONS section in https://pubs.opengroup.org/onlinepubs/9699919799.2018edition/utilities/V3_chap01.html#tag_17_04 Default Behavior: When this section is listed as "None.", it means that the implementation need not support any options. Standard utilities that do not accept options, but that do accept operands, shall recognize "--" as a first argument to be discarded. The requirement for recognizing "--" is because conforming applications need a way to shield their operands from any arbitrary options that the implementation may provide as an extension.
Re: [hackers] [sbase][PATCH] Support -- in all utilities except echo(1)
On Fri, 28 Jun 2019 00:52:33 -0700 Michael Forney wrote: Dear Michael, > I'm really sorry for ignoring this for so long. Someone asked me about > the `argv0 = *argv, argv0 ? (argc--, argv++) : (void *)0;` one-liner, > and I remembered this patch. I finally took the time to investigate > this issue in more detail. > > I was confused by this for a bit, but I think you are referring to > http://austingroupbugs.net/view.php?id=192, which was not present in > the 2008 edition[0], but did make it into the 2013 edition[1]. > > As far as I know, unless the documentation states that a utility shall > conform to the Utility Syntax Guidelines, it is not required to > support `--`. However, it does use the language "should" which > means[2]: > > For an implementation that conforms to POSIX.1-2017, > describes a feature or behavior that is recommended but not mandatory. > > It looks like these tools formerly supported `--`, but this was > removed in 9016d288[3], with the justification that echo(1) is > mandated to *not* support `--` and that there is no reason why other > tools with no options should handle it. I would argue that there *is* > a reason to support `--`, and that is the Utility Syntax Guidelines. > echo(1) is an exception that is explicitly called out, but a single > exception shouldn't set a precedent for other tools. In fact, tty(1) > is actually required to support the Utility Syntax Guidelines[4], so > 9016d288 actually breaks POSIX conformance. The examples you mentioned > in basename(1) and dirname(1), and also a note about `--` I found in > expr(1) seem to support this view. > > Rather than add a special ENOFLAGS macro, I am tempted to revert > 9016d288 instead. The question that remains is how to handle arguments > that look like options (start with `-`) for tools that don't have any > specified options. It seems for some tools it is important to treat > them as operands (e.g. printf, expr), but for others that are commonly > extended with additional options, it might make sense to fail with > usage() to catch unsupported uses (e.g. hostname, tsort, chroot). thanks for elaborating about it more! This topic was a big point of debate back then. In this matter Posix should have the final word. So in this sense, if Posix mandates for e.g. dirname(1) to support -- then we should not ignore -- and the user will have to write $ dirname -- -- I think though that we all agree that an ENOFLAGS-macro would be insanity. With best regards Laslo -- Laslo Hunhold pgpLIiwTlfZKI.pgp Description: PGP signature
Re: [hackers] [sbase][PATCH] Support -- in all utilities except echo(1)
Hi Mattias, I'm really sorry for ignoring this for so long. Someone asked me about the `argv0 = *argv, argv0 ? (argc--, argv++) : (void *)0;` one-liner, and I remembered this patch. I finally took the time to investigate this issue in more detail. On 2018-07-08, Mattias Andrée wrote: > In POSIX-2017 it was clarified via the documentation for > basename(1) and dirname(1) that all programs should support > -- unless specified otherwise. I was confused by this for a bit, but I think you are referring to http://austingroupbugs.net/view.php?id=192, which was not present in the 2008 edition[0], but did make it into the 2013 edition[1]. As far as I know, unless the documentation states that a utility shall conform to the Utility Syntax Guidelines, it is not required to support `--`. However, it does use the language "should" which means[2]: For an implementation that conforms to POSIX.1-2017, describes a feature or behavior that is recommended but not mandatory. It looks like these tools formerly supported `--`, but this was removed in 9016d288[3], with the justification that echo(1) is mandated to *not* support `--` and that there is no reason why other tools with no options should handle it. I would argue that there *is* a reason to support `--`, and that is the Utility Syntax Guidelines. echo(1) is an exception that is explicitly called out, but a single exception shouldn't set a precedent for other tools. In fact, tty(1) is actually required to support the Utility Syntax Guidelines[4], so 9016d288 actually breaks POSIX conformance. The examples you mentioned in basename(1) and dirname(1), and also a note about `--` I found in expr(1) seem to support this view. Rather than add a special ENOFLAGS macro, I am tempted to revert 9016d288 instead. The question that remains is how to handle arguments that look like options (start with `-`) for tools that don't have any specified options. It seems for some tools it is important to treat them as operands (e.g. printf, expr), but for others that are commonly extended with additional options, it might make sense to fail with usage() to catch unsupported uses (e.g. hostname, tsort, chroot). [0] http://pubs.opengroup.org/onlinepubs/9699919799.2008edition/utilities/basename.html [1] http://pubs.opengroup.org/onlinepubs/9699919799.2013edition/utilities/basename.html [2] https://pubs.opengroup.org/onlinepubs/9699919799.2018edition/basedefs/V1_chap01.html#tag_01_05_06 [3] https://git.suckless.org/sbase/commit/9016d288f1cd03bf514fe84d537b203040e1ccf8.html [4] https://pubs.opengroup.org/onlinepubs/9699919799/utilities/tty.html#tag_20_135_04
[hackers] [sbase][PATCH] Support -- in all utilities except echo(1)
In POSIX-2017 it was clarified via the documentation for basename(1) and dirname(1) that all programs should support -- unless specified otherwise. Signed-off-by: Mattias Andrée --- arg.h | 13 + basename.c | 5 + chroot.c | 2 +- cksum.c| 8 +++- dirname.c | 5 + expr.c | 8 +++- hostname.c | 2 +- link.c | 2 +- logname.c | 2 +- nohup.c| 2 +- printenv.c | 8 +++- printf.c | 9 + rev.c | 5 + setsid.c | 2 +- sleep.c| 2 +- sponge.c | 2 +- sync.c | 2 +- tsort.c| 5 + tty.c | 2 +- unlink.c | 2 +- whoami.c | 2 +- yes.c | 8 +++- 22 files changed, 62 insertions(+), 36 deletions(-) diff --git a/arg.h b/arg.h index 0b23c53..e8b84ba 100644 --- a/arg.h +++ b/arg.h @@ -62,4 +62,17 @@ extern char *argv0; #define LNGARG() [0][0] +#define ENOFLAGS(x)do {\ + argv0 = *argv++;\ + argc--;\ + if (argc && argv[0][0] == '-') {\ + if (argv[0][1] == '-' && !argv[0][2]) {\ + argv++;\ + argc--;\ + } else {\ + (x);\ + }\ + }\ + } while (0) + #endif diff --git a/basename.c b/basename.c index 94a2848..de41d86 100644 --- a/basename.c +++ b/basename.c @@ -17,10 +17,7 @@ main(int argc, char *argv[]) ssize_t off; char *p; - ARGBEGIN { - default: - usage(); - } ARGEND + ENOFLAGS(usage()); if (argc != 1 && argc != 2) usage(); diff --git a/chroot.c b/chroot.c index 22bc62e..e3d4c3e 100644 --- a/chroot.c +++ b/chroot.c @@ -17,7 +17,7 @@ main(int argc, char *argv[]) char *shell[] = { "/bin/sh", "-i", NULL }, *aux, *cmd; int savederrno; - argv0 = *argv, argv0 ? (argc--, argv++) : (void *)0; + ENOFLAGS(usage()); if (!argc) usage(); diff --git a/cksum.c b/cksum.c index 4e7dce6..68c8fc6 100644 --- a/cksum.c +++ b/cksum.c @@ -92,12 +92,18 @@ cksum(int fd, const char *s) putchar('\n'); } +static void +usage(void) +{ + eprintf("usage: %s [file ...]\n", argv0); +} + int main(int argc, char *argv[]) { int fd; - argv0 = *argv, argv0 ? (argc--, argv++) : (void *)0; + ENOFLAGS(usage()); if (!argc) { cksum(0, NULL); diff --git a/dirname.c b/dirname.c index 45e1a7e..fcc12e1 100644 --- a/dirname.c +++ b/dirname.c @@ -13,10 +13,7 @@ usage(void) int main(int argc, char *argv[]) { - ARGBEGIN { - default: - usage(); - } ARGEND + ENOFLAGS(usage()); if (argc != 1) usage(); diff --git a/expr.c b/expr.c index d9c758d..b9bed3f 100644 --- a/expr.c +++ b/expr.c @@ -252,12 +252,18 @@ parse(char *expr[], int numexpr) return (valp->str && *valp->str) || valp->num; } +static void +usage(void) +{ + eprintf("usage: %s expression\n", argv0); +} + int main(int argc, char *argv[]) { int ret; - argv0 = *argv, argv0 ? (argc--, argv++) : (void *)0; + ENOFLAGS(usage()); ret = !parse(argv, argc); diff --git a/hostname.c b/hostname.c index 495d40d..cd66893 100644 --- a/hostname.c +++ b/hostname.c @@ -16,7 +16,7 @@ main(int argc, char *argv[]) { char host[HOST_NAME_MAX + 1]; - argv0 = *argv, argv0 ? (argc--, argv++) : (void *)0; + ENOFLAGS(usage()); if (!argc) { if (gethostname(host, sizeof(host)) < 0) diff --git a/link.c b/link.c index a260136..cf76f32 100644 --- a/link.c +++ b/link.c @@ -12,7 +12,7 @@ usage(void) int main(int argc, char *argv[]) { - argv0 = *argv, argv0 ? (argc--, argv++) : (void *)0; + ENOFLAGS(usage()); if (argc != 2) usage(); diff --git a/logname.c b/logname.c index 8eb8eea..04a2aaa 100644 --- a/logname.c +++ b/logname.c @@ -15,7 +15,7 @@ main(int argc, char *argv[]) { char *login; - argv0 = *argv, argv0 ? (argc--, argv++) : (void *)0; + ENOFLAGS(usage()); if (argc) usage(); diff --git a/nohup.c b/nohup.c index c75ea45..436c072 100644 --- a/nohup.c +++ b/nohup.c @@ -19,7 +19,7 @@ main(int argc, char *argv[]) { int fd, savederrno; - argv0 = *argv, argv0 ? (argc--, argv++) : (void *)0; + ENOFLAGS(usage()); if (!argc) usage(); diff --git a/printenv.c b/printenv.c index 7ff5393..7ba0e47 100644 --- a/printenv.c +++ b/printenv.c @@ -6,13 +6,19 @@ extern char **environ; +static void +usage(void) +{ + eprintf("usage: %s [var ...]\n", argv0); +} + int main(int argc, char *argv[]) { char *var;