Re: new getformat() for jot(1)

2016-09-06 Thread Theo Buehler
On Wed, Sep 07, 2016 at 12:00:17AM +0200, Martin Natano wrote:
> On Sat, Sep 03, 2016 at 04:55:59PM +0100, Theo Buehler wrote:
> > The -w flag to jot() allows the user to specify a format string that
> > will be passed to printf after it was verified that it contains only one
> > conversion specifier.  There are some subtle differences what jot's
> > getformat() accepts and what printf will do. Thus, I took vfprintf.c and
> > carved out whatever is unneeded for jot itself. The result is a slightly
> > less ugly function in a separate file.
> 
> Please see some comments below.
> 
> I really tried to understand all the corner cases in the getformat()
> function, but couldn't wrap my head around it. I believe it would be
> best to just axe the -w flag. Yes, there are probably scripts out there
> using it, but I think carrying that burden around is not worth it. Every
> possible implementation either is an adapted reimplementation of printf,
> or whitewashing the string before passing it to printf(). I would like
> to remind of the patch(1) ed script issue that resulted in shell
> injection, just because the whitewash code and the actual parser where
> not in sync. It's a losing game.

Thanks for taking the time of looking into it. I would very much like to
ditch -w if there are no objections. It's just plain horrific and a
terrible idea to begin with.

I can understand the convenience of something like "jot -w '%02d' 10",
but everything more complicated seems to be better left to awk, perl or
whatever else your preferred scripting language is.

There is a similar printf reimplementation in usr.bin/printf/prinf.c
which is just as unpleasant, but mandated by POSIX.

Below is a diff addressing your comments.

> 'boring', 'infinity' and 'randomize' are unused in getformat.c.

yes.

> How about something like this instead?
> 
>   #define is_digit(c) ((c) >= '0' && (c) <= '9')
> 
> This would also allow to remove the to_digit() macro.

I agree.

> > +   sz = sizeof(fmt) - strlen(fmt) - 1;
> 
> The sizeof() doesn't do what you expect it to do here.

that was really stupid, thanks.

> Previously the error message for 'jot -w %d%d 10' was "jot: too many
> conversions", now it is "jot: illegal or unsupported format '%d%d'". I
> think the previous error was more helpful.

restored previous error message

> > +   if (ch == '$')
> 
> What is this check supposed to do? There is no '$' case, so the default
> will be invoked after 'got reswitch;'.

incomplete purging of cases unneeded for jot

Index: Makefile
===
RCS file: /var/cvs/src/usr.bin/jot/Makefile,v
retrieving revision 1.5
diff -u -p -r1.5 Makefile
--- Makefile10 Jan 2016 01:15:52 -  1.5
+++ Makefile6 Sep 2016 22:35:48 -
@@ -1,6 +1,7 @@
 #  $OpenBSD: Makefile,v 1.5 2016/01/10 01:15:52 tb Exp $
 
 PROG=  jot
+SRCS=  getformat.c jot.c
 CFLAGS+= -Wall
 LDADD+=-lm
 DPADD+=${LIBM}
Index: getformat.c
===
RCS file: getformat.c
diff -N getformat.c
--- /dev/null   1 Jan 1970 00:00:00 -
+++ getformat.c 6 Sep 2016 22:35:58 -
@@ -0,0 +1,180 @@
+/* $OpenBSD$   */
+/*-
+ * Copyright (c) 1990 The Regents of the University of California.
+ * All rights reserved.
+ *
+ * This code is derived from software contributed to Berkeley by
+ * Chris Torek.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *notice, this list of conditions and the following disclaimer in the
+ *documentation and/or other materials provided with the distribution.
+ * 3. Neither the name of the University nor the names of its contributors
+ *may be used to endorse or promote products derived from this software
+ *without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
+ * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
+ * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+ * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
+ * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+ * SUCH DAMAGE.
+ */
+
+/* Based on 

Re: new getformat() for jot(1)

2016-09-06 Thread Martin Natano
On Sat, Sep 03, 2016 at 04:55:59PM +0100, Theo Buehler wrote:
> The -w flag to jot() allows the user to specify a format string that
> will be passed to printf after it was verified that it contains only one
> conversion specifier.  There are some subtle differences what jot's
> getformat() accepts and what printf will do. Thus, I took vfprintf.c and
> carved out whatever is unneeded for jot itself. The result is a slightly
> less ugly function in a separate file.

Please see some comments below.

I really tried to understand all the corner cases in the getformat()
function, but couldn't wrap my head around it. I believe it would be
best to just axe the -w flag. Yes, there are probably scripts out there
using it, but I think carrying that burden around is not worth it. Every
possible implementation either is an adapted reimplementation of printf,
or whitewashing the string before passing it to printf(). I would like
to remind of the patch(1) ed script issue that resulted in shell
injection, just because the whitewash code and the actual parser where
not in sync. It's a losing game.

natano


> 
> Index: Makefile
> ===
> RCS file: /var/cvs/src/usr.bin/jot/Makefile,v
> retrieving revision 1.5
> diff -u -p -r1.5 Makefile
> --- Makefile  10 Jan 2016 01:15:52 -  1.5
> +++ Makefile  3 Sep 2016 15:48:06 -
> @@ -1,6 +1,7 @@
>  #$OpenBSD: Makefile,v 1.5 2016/01/10 01:15:52 tb Exp $
>  
>  PROG=jot
> +SRCS=getformat.c jot.c
>  CFLAGS+= -Wall
>  LDADD+=  -lm
>  DPADD+=  ${LIBM}
> Index: getformat.c
> ===
> RCS file: getformat.c
> diff -N getformat.c
> --- /dev/null 1 Jan 1970 00:00:00 -
> +++ getformat.c   3 Sep 2016 15:55:21 -
> @@ -0,0 +1,188 @@
> +/*   $OpenBSD$   */
> +/*-
> + * Copyright (c) 1990 The Regents of the University of California.
> + * All rights reserved.
> + *
> + * This code is derived from software contributed to Berkeley by
> + * Chris Torek.
> + *
> + * Redistribution and use in source and binary forms, with or without
> + * modification, are permitted provided that the following conditions
> + * are met:
> + * 1. Redistributions of source code must retain the above copyright
> + *notice, this list of conditions and the following disclaimer.
> + * 2. Redistributions in binary form must reproduce the above copyright
> + *notice, this list of conditions and the following disclaimer in the
> + *documentation and/or other materials provided with the distribution.
> + * 3. Neither the name of the University nor the names of its contributors
> + *may be used to endorse or promote products derived from this software
> + *without specific prior written permission.
> + *
> + * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
> + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
> + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
> + * ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
> + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
> + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
> + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
> + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
> + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
> + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
> + * SUCH DAMAGE.
> + */
> +
> +/* Based on src/lib/libc/stdio/vfprintf.c r1.77 */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +extern int   prec;
> +extern bool  boring;
> +extern bool  chardata;
> +extern bool  infinity;
> +extern bool  intdata;
> +extern bool  longdata;
> +extern bool  nosign;
> +extern bool  randomize;

'boring', 'infinity' and 'randomize' are unused in getformat.c.


> +
> +int getformat(char *);
> +
> +/*
> + * Macros for converting digits to letters and vice versa
> + */
> +#define  to_digit(c) ((c) - '0')
> +#define  is_digit(c) ((unsigned)to_digit(c) <= 9)

I would prefer this to be less magic. How about something like this
instead?

#define is_digit(c) ((c) >= '0' && (c) <= '9')

This would also allow to remove the to_digit() macro.


> +
> +int
> +getformat(char *fmt)
> +{
> + int ch; /* character from fmt */
> + wchar_t wc;
> + mbstate_t ps;
> + size_t sz;
> + bool firsttime = true;
> +
> + sz = sizeof(fmt) - strlen(fmt) - 1;

The sizeof() doesn't do what you expect it to do here. 'fmt' is just a
pointer here, so the value returned is far to low. What we want is the
size of the original array instead.

$ jot -w 'xxx' 10
jot: -w word too long


> +
> + memset(, 0, sizeof(ps));
> + /*
> +  * Scan the