Re: new getformat() for jot(1)
On Wed, Sep 07, 2016 at 12:00:17AM +0200, Martin Natano wrote: > On Sat, Sep 03, 2016 at 04:55:59PM +0100, Theo Buehler wrote: > > The -w flag to jot() allows the user to specify a format string that > > will be passed to printf after it was verified that it contains only one > > conversion specifier. There are some subtle differences what jot's > > getformat() accepts and what printf will do. Thus, I took vfprintf.c and > > carved out whatever is unneeded for jot itself. The result is a slightly > > less ugly function in a separate file. > > Please see some comments below. > > I really tried to understand all the corner cases in the getformat() > function, but couldn't wrap my head around it. I believe it would be > best to just axe the -w flag. Yes, there are probably scripts out there > using it, but I think carrying that burden around is not worth it. Every > possible implementation either is an adapted reimplementation of printf, > or whitewashing the string before passing it to printf(). I would like > to remind of the patch(1) ed script issue that resulted in shell > injection, just because the whitewash code and the actual parser where > not in sync. It's a losing game. Thanks for taking the time of looking into it. I would very much like to ditch -w if there are no objections. It's just plain horrific and a terrible idea to begin with. I can understand the convenience of something like "jot -w '%02d' 10", but everything more complicated seems to be better left to awk, perl or whatever else your preferred scripting language is. There is a similar printf reimplementation in usr.bin/printf/prinf.c which is just as unpleasant, but mandated by POSIX. Below is a diff addressing your comments. > 'boring', 'infinity' and 'randomize' are unused in getformat.c. yes. > How about something like this instead? > > #define is_digit(c) ((c) >= '0' && (c) <= '9') > > This would also allow to remove the to_digit() macro. I agree. > > + sz = sizeof(fmt) - strlen(fmt) - 1; > > The sizeof() doesn't do what you expect it to do here. that was really stupid, thanks. > Previously the error message for 'jot -w %d%d 10' was "jot: too many > conversions", now it is "jot: illegal or unsupported format '%d%d'". I > think the previous error was more helpful. restored previous error message > > + if (ch == '$') > > What is this check supposed to do? There is no '$' case, so the default > will be invoked after 'got reswitch;'. incomplete purging of cases unneeded for jot Index: Makefile === RCS file: /var/cvs/src/usr.bin/jot/Makefile,v retrieving revision 1.5 diff -u -p -r1.5 Makefile --- Makefile10 Jan 2016 01:15:52 - 1.5 +++ Makefile6 Sep 2016 22:35:48 - @@ -1,6 +1,7 @@ # $OpenBSD: Makefile,v 1.5 2016/01/10 01:15:52 tb Exp $ PROG= jot +SRCS= getformat.c jot.c CFLAGS+= -Wall LDADD+=-lm DPADD+=${LIBM} Index: getformat.c === RCS file: getformat.c diff -N getformat.c --- /dev/null 1 Jan 1970 00:00:00 - +++ getformat.c 6 Sep 2016 22:35:58 - @@ -0,0 +1,180 @@ +/* $OpenBSD$ */ +/*- + * Copyright (c) 1990 The Regents of the University of California. + * All rights reserved. + * + * This code is derived from software contributed to Berkeley by + * Chris Torek. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + *notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + *notice, this list of conditions and the following disclaimer in the + *documentation and/or other materials provided with the distribution. + * 3. Neither the name of the University nor the names of its contributors + *may be used to endorse or promote products derived from this software + *without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + */ + +/* Based on
Re: new getformat() for jot(1)
On Sat, Sep 03, 2016 at 04:55:59PM +0100, Theo Buehler wrote: > The -w flag to jot() allows the user to specify a format string that > will be passed to printf after it was verified that it contains only one > conversion specifier. There are some subtle differences what jot's > getformat() accepts and what printf will do. Thus, I took vfprintf.c and > carved out whatever is unneeded for jot itself. The result is a slightly > less ugly function in a separate file. Please see some comments below. I really tried to understand all the corner cases in the getformat() function, but couldn't wrap my head around it. I believe it would be best to just axe the -w flag. Yes, there are probably scripts out there using it, but I think carrying that burden around is not worth it. Every possible implementation either is an adapted reimplementation of printf, or whitewashing the string before passing it to printf(). I would like to remind of the patch(1) ed script issue that resulted in shell injection, just because the whitewash code and the actual parser where not in sync. It's a losing game. natano > > Index: Makefile > === > RCS file: /var/cvs/src/usr.bin/jot/Makefile,v > retrieving revision 1.5 > diff -u -p -r1.5 Makefile > --- Makefile 10 Jan 2016 01:15:52 - 1.5 > +++ Makefile 3 Sep 2016 15:48:06 - > @@ -1,6 +1,7 @@ > #$OpenBSD: Makefile,v 1.5 2016/01/10 01:15:52 tb Exp $ > > PROG=jot > +SRCS=getformat.c jot.c > CFLAGS+= -Wall > LDADD+= -lm > DPADD+= ${LIBM} > Index: getformat.c > === > RCS file: getformat.c > diff -N getformat.c > --- /dev/null 1 Jan 1970 00:00:00 - > +++ getformat.c 3 Sep 2016 15:55:21 - > @@ -0,0 +1,188 @@ > +/* $OpenBSD$ */ > +/*- > + * Copyright (c) 1990 The Regents of the University of California. > + * All rights reserved. > + * > + * This code is derived from software contributed to Berkeley by > + * Chris Torek. > + * > + * Redistribution and use in source and binary forms, with or without > + * modification, are permitted provided that the following conditions > + * are met: > + * 1. Redistributions of source code must retain the above copyright > + *notice, this list of conditions and the following disclaimer. > + * 2. Redistributions in binary form must reproduce the above copyright > + *notice, this list of conditions and the following disclaimer in the > + *documentation and/or other materials provided with the distribution. > + * 3. Neither the name of the University nor the names of its contributors > + *may be used to endorse or promote products derived from this software > + *without specific prior written permission. > + * > + * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND > + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE > + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE > + * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE > + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL > + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS > + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) > + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT > + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY > + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF > + * SUCH DAMAGE. > + */ > + > +/* Based on src/lib/libc/stdio/vfprintf.c r1.77 */ > + > +#include > +#include > +#include > +#include > +#include > +#include > + > +extern int prec; > +extern bool boring; > +extern bool chardata; > +extern bool infinity; > +extern bool intdata; > +extern bool longdata; > +extern bool nosign; > +extern bool randomize; 'boring', 'infinity' and 'randomize' are unused in getformat.c. > + > +int getformat(char *); > + > +/* > + * Macros for converting digits to letters and vice versa > + */ > +#define to_digit(c) ((c) - '0') > +#define is_digit(c) ((unsigned)to_digit(c) <= 9) I would prefer this to be less magic. How about something like this instead? #define is_digit(c) ((c) >= '0' && (c) <= '9') This would also allow to remove the to_digit() macro. > + > +int > +getformat(char *fmt) > +{ > + int ch; /* character from fmt */ > + wchar_t wc; > + mbstate_t ps; > + size_t sz; > + bool firsttime = true; > + > + sz = sizeof(fmt) - strlen(fmt) - 1; The sizeof() doesn't do what you expect it to do here. 'fmt' is just a pointer here, so the value returned is far to low. What we want is the size of the original array instead. $ jot -w 'xxx' 10 jot: -w word too long > + > + memset(, 0, sizeof(ps)); > + /* > + * Scan the