Re: new getformat() for jot(1)

2016-09-06 Thread Theo Buehler
On Wed, Sep 07, 2016 at 12:00:17AM +0200, Martin Natano wrote:
> On Sat, Sep 03, 2016 at 04:55:59PM +0100, Theo Buehler wrote:
> > The -w flag to jot() allows the user to specify a format string that
> > will be passed to printf after it was verified that it contains only one
> > conversion specifier.  There are some subtle differences what jot's
> > getformat() accepts and what printf will do. Thus, I took vfprintf.c and
> > carved out whatever is unneeded for jot itself. The result is a slightly
> > less ugly function in a separate file.
> 
> Please see some comments below.
> 
> I really tried to understand all the corner cases in the getformat()
> function, but couldn't wrap my head around it. I believe it would be
> best to just axe the -w flag. Yes, there are probably scripts out there
> using it, but I think carrying that burden around is not worth it. Every
> possible implementation either is an adapted reimplementation of printf,
> or whitewashing the string before passing it to printf(). I would like
> to remind of the patch(1) ed script issue that resulted in shell
> injection, just because the whitewash code and the actual parser where
> not in sync. It's a losing game.

Thanks for taking the time of looking into it. I would very much like to
ditch -w if there are no objections. It's just plain horrific and a
terrible idea to begin with.

I can understand the convenience of something like "jot -w '%02d' 10",
but everything more complicated seems to be better left to awk, perl or
whatever else your preferred scripting language is.

There is a similar printf reimplementation in usr.bin/printf/prinf.c
which is just as unpleasant, but mandated by POSIX.

Below is a diff addressing your comments.

> 'boring', 'infinity' and 'randomize' are unused in getformat.c.

yes.

> How about something like this instead?
> 
>   #define is_digit(c) ((c) >= '0' && (c) <= '9')
> 
> This would also allow to remove the to_digit() macro.

I agree.

> > +   sz = sizeof(fmt) - strlen(fmt) - 1;
> 
> The sizeof() doesn't do what you expect it to do here.

that was really stupid, thanks.

> Previously the error message for 'jot -w %d%d 10' was "jot: too many
> conversions", now it is "jot: illegal or unsupported format '%d%d'". I
> think the previous error was more helpful.

restored previous error message

> > +   if (ch == '$')
> 
> What is this check supposed to do? There is no '$' case, so the default
> will be invoked after 'got reswitch;'.

incomplete purging of cases unneeded for jot

Index: Makefile
===
RCS file: /var/cvs/src/usr.bin/jot/Makefile,v
retrieving revision 1.5
diff -u -p -r1.5 Makefile
--- Makefile10 Jan 2016 01:15:52 -  1.5
+++ Makefile6 Sep 2016 22:35:48 -
@@ -1,6 +1,7 @@
 #  $OpenBSD: Makefile,v 1.5 2016/01/10 01:15:52 tb Exp $
 
 PROG=  jot
+SRCS=  getformat.c jot.c
 CFLAGS+= -Wall
 LDADD+=-lm
 DPADD+=${LIBM}
Index: getformat.c
===
RCS file: getformat.c
diff -N getformat.c
--- /dev/null   1 Jan 1970 00:00:00 -
+++ getformat.c 6 Sep 2016 22:35:58 -
@@ -0,0 +1,180 @@
+/* $OpenBSD$   */
+/*-
+ * Copyright (c) 1990 The Regents of the University of California.
+ * All rights reserved.
+ *
+ * This code is derived from software contributed to Berkeley by
+ * Chris Torek.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *notice, this list of conditions and the following disclaimer in the
+ *documentation and/or other materials provided with the distribution.
+ * 3. Neither the name of the University nor the names of its contributors
+ *may be used to endorse or promote products derived from this software
+ *without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
+ * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
+ * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+ * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
+ * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+ * SUCH DAMAGE.
+ */
+
+/* Based on 

Re: new getformat() for jot(1)

2016-09-06 Thread Martin Natano
On Sat, Sep 03, 2016 at 04:55:59PM +0100, Theo Buehler wrote:
> The -w flag to jot() allows the user to specify a format string that
> will be passed to printf after it was verified that it contains only one
> conversion specifier.  There are some subtle differences what jot's
> getformat() accepts and what printf will do. Thus, I took vfprintf.c and
> carved out whatever is unneeded for jot itself. The result is a slightly
> less ugly function in a separate file.

Please see some comments below.

I really tried to understand all the corner cases in the getformat()
function, but couldn't wrap my head around it. I believe it would be
best to just axe the -w flag. Yes, there are probably scripts out there
using it, but I think carrying that burden around is not worth it. Every
possible implementation either is an adapted reimplementation of printf,
or whitewashing the string before passing it to printf(). I would like
to remind of the patch(1) ed script issue that resulted in shell
injection, just because the whitewash code and the actual parser where
not in sync. It's a losing game.

natano


> 
> Index: Makefile
> ===
> RCS file: /var/cvs/src/usr.bin/jot/Makefile,v
> retrieving revision 1.5
> diff -u -p -r1.5 Makefile
> --- Makefile  10 Jan 2016 01:15:52 -  1.5
> +++ Makefile  3 Sep 2016 15:48:06 -
> @@ -1,6 +1,7 @@
>  #$OpenBSD: Makefile,v 1.5 2016/01/10 01:15:52 tb Exp $
>  
>  PROG=jot
> +SRCS=getformat.c jot.c
>  CFLAGS+= -Wall
>  LDADD+=  -lm
>  DPADD+=  ${LIBM}
> Index: getformat.c
> ===
> RCS file: getformat.c
> diff -N getformat.c
> --- /dev/null 1 Jan 1970 00:00:00 -
> +++ getformat.c   3 Sep 2016 15:55:21 -
> @@ -0,0 +1,188 @@
> +/*   $OpenBSD$   */
> +/*-
> + * Copyright (c) 1990 The Regents of the University of California.
> + * All rights reserved.
> + *
> + * This code is derived from software contributed to Berkeley by
> + * Chris Torek.
> + *
> + * Redistribution and use in source and binary forms, with or without
> + * modification, are permitted provided that the following conditions
> + * are met:
> + * 1. Redistributions of source code must retain the above copyright
> + *notice, this list of conditions and the following disclaimer.
> + * 2. Redistributions in binary form must reproduce the above copyright
> + *notice, this list of conditions and the following disclaimer in the
> + *documentation and/or other materials provided with the distribution.
> + * 3. Neither the name of the University nor the names of its contributors
> + *may be used to endorse or promote products derived from this software
> + *without specific prior written permission.
> + *
> + * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
> + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
> + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
> + * ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
> + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
> + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
> + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
> + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
> + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
> + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
> + * SUCH DAMAGE.
> + */
> +
> +/* Based on src/lib/libc/stdio/vfprintf.c r1.77 */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +extern int   prec;
> +extern bool  boring;
> +extern bool  chardata;
> +extern bool  infinity;
> +extern bool  intdata;
> +extern bool  longdata;
> +extern bool  nosign;
> +extern bool  randomize;

'boring', 'infinity' and 'randomize' are unused in getformat.c.


> +
> +int getformat(char *);
> +
> +/*
> + * Macros for converting digits to letters and vice versa
> + */
> +#define  to_digit(c) ((c) - '0')
> +#define  is_digit(c) ((unsigned)to_digit(c) <= 9)

I would prefer this to be less magic. How about something like this
instead?

#define is_digit(c) ((c) >= '0' && (c) <= '9')

This would also allow to remove the to_digit() macro.


> +
> +int
> +getformat(char *fmt)
> +{
> + int ch; /* character from fmt */
> + wchar_t wc;
> + mbstate_t ps;
> + size_t sz;
> + bool firsttime = true;
> +
> + sz = sizeof(fmt) - strlen(fmt) - 1;

The sizeof() doesn't do what you expect it to do here. 'fmt' is just a
pointer here, so the value returned is far to low. What we want is the
size of the original array instead.

$ jot -w 'xxx' 10
jot: -w word too long


> +
> + memset(, 0, sizeof(ps));
> + /*
> +  * Scan the 

new getformat() for jot(1)

2016-09-03 Thread Theo Buehler
The -w flag to jot() allows the user to specify a format string that
will be passed to printf after it was verified that it contains only one
conversion specifier.  There are some subtle differences what jot's
getformat() accepts and what printf will do. Thus, I took vfprintf.c and
carved out whatever is unneeded for jot itself. The result is a slightly
less ugly function in a separate file.

Index: Makefile
===
RCS file: /var/cvs/src/usr.bin/jot/Makefile,v
retrieving revision 1.5
diff -u -p -r1.5 Makefile
--- Makefile10 Jan 2016 01:15:52 -  1.5
+++ Makefile3 Sep 2016 15:48:06 -
@@ -1,6 +1,7 @@
 #  $OpenBSD: Makefile,v 1.5 2016/01/10 01:15:52 tb Exp $
 
 PROG=  jot
+SRCS=  getformat.c jot.c
 CFLAGS+= -Wall
 LDADD+=-lm
 DPADD+=${LIBM}
Index: getformat.c
===
RCS file: getformat.c
diff -N getformat.c
--- /dev/null   1 Jan 1970 00:00:00 -
+++ getformat.c 3 Sep 2016 15:55:21 -
@@ -0,0 +1,188 @@
+/* $OpenBSD$   */
+/*-
+ * Copyright (c) 1990 The Regents of the University of California.
+ * All rights reserved.
+ *
+ * This code is derived from software contributed to Berkeley by
+ * Chris Torek.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *notice, this list of conditions and the following disclaimer in the
+ *documentation and/or other materials provided with the distribution.
+ * 3. Neither the name of the University nor the names of its contributors
+ *may be used to endorse or promote products derived from this software
+ *without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
+ * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
+ * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+ * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
+ * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+ * SUCH DAMAGE.
+ */
+
+/* Based on src/lib/libc/stdio/vfprintf.c r1.77 */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+extern int prec;
+extern boolboring;
+extern boolchardata;
+extern boolinfinity;
+extern boolintdata;
+extern boollongdata;
+extern boolnosign;
+extern boolrandomize;
+
+int getformat(char *);
+
+/*
+ * Macros for converting digits to letters and vice versa
+ */
+#defineto_digit(c) ((c) - '0')
+#defineis_digit(c) ((unsigned)to_digit(c) <= 9)
+
+int
+getformat(char *fmt)
+{
+   int ch; /* character from fmt */
+   wchar_t wc;
+   mbstate_t ps;
+   size_t sz;
+   bool firsttime = true;
+
+   sz = sizeof(fmt) - strlen(fmt) - 1;
+
+   memset(, 0, sizeof(ps));
+   /*
+* Scan the format for conversions (`%' character).
+*/
+   for (;;) {
+   size_t len;
+
+   while ((len = mbrtowc(, fmt, MB_CUR_MAX, )) != 0) {
+   if (len == (size_t)-1 || len == (size_t)-2) {
+   err(1, "illegal or incomplete byte sequence "
+   "in format string");
+   }
+   if (wc == '%') {
+   if (*(fmt + 1) == '\0') {
+   /* no trailing '%' allowed */
+   if (sz <= 0)
+   errx(1, "-w word too long");
+   strlcat(fmt, "%", sizeof fmt);
+   return 0;
+   } else if (*(fmt + 1) != '%')
+   break;
+   fmt += 2;   /* skip over "%%" */
+   } else
+   fmt += len;
+   }
+
+   if (!firsttime) {
+   if (len == 0)
+   return 0;
+   return 1;
+   }
+
+   if (len == 0) {
+   if (chardata) {
+   if