On Fri, May 10, 2013 at 08:04:57AM +0100, Stuart Henderson wrote: > On 2013/05/10 13:18, Damien Miller wrote: > > On Wed, 8 May 2013, Ted Unangst wrote: > > > > > On Tue, Apr 30, 2013 at 18:57, Arto Jonsson wrote: > > > > Taken from netbsd with minor modifications. Comments? > > > > > > I don't think you've received much feedback. I don't know how other > > > developers feel, but the question I have is can't this be done with a > > > rather simple awk script? or perl? One of the reasons we have perl in > > > base is precisely so it can be used for things like this. > > > > This implementation has the benefits of being small, having existing > > maintainers (NetBSD) and already having been written and debugged. It > > seems like make-work to do it over in Perl. > > If we do use this implementation, then pascal@'s version from 2011 added > some fixes from FreeBSD, http://comments.gmane.org/gmane.os.openbsd.tech/25740
Here's an updated diff. Compared to the previous diff '-' is now handled as stdin. From the freebsd version I noticed that the previous diff also had useless exit() call which I removed. Comments? Index: Makefile =================================================================== RCS file: /cvs/src/usr.bin/Makefile,v retrieving revision 1.129 diff -u -p -r1.129 Makefile --- Makefile 15 Mar 2013 06:01:41 -0000 1.129 +++ Makefile 10 May 2013 14:09:23 -0000 @@ -16,7 +16,7 @@ SUBDIR= apply apropos ar arch asa asn1_c m4 mail make man mandoc mesg mg \ midiplay mixerctl mkdep mklocale mkstr mktemp modstat nc netstat \ newsyslog \ - nfsstat nice nm nohup oldrdist pagesize passwd paste patch pctr \ + nfsstat nice nm nl nohup oldrdist pagesize passwd paste patch pctr \ pkg-config pkill \ pr printenv printf quota radioctl ranlib rcs rdist rdistd \ readlink renice rev rpcgen rpcinfo rs rsh rup ruptime rusers rwall \ Index: nl/Makefile =================================================================== RCS file: nl/Makefile diff -N nl/Makefile --- /dev/null 1 Jan 1970 00:00:00 -0000 +++ nl/Makefile 10 May 2013 14:09:24 -0000 @@ -0,0 +1,6 @@ +# $OpenBSD$ +# $NetBSD: Makefile,v 1.4 2011/08/16 12:00:46 christos Exp $ + +PROG= nl + +.include <bsd.prog.mk> Index: nl/nl.1 =================================================================== RCS file: nl/nl.1 diff -N nl/nl.1 --- /dev/null 1 Jan 1970 00:00:00 -0000 +++ nl/nl.1 10 May 2013 14:09:24 -0000 @@ -0,0 +1,212 @@ +.\" $OpenBSD$ +.\" $NetBSD: nl.1,v 1.12 2012/04/08 22:00:39 wiz Exp $ +.\" +.\" Copyright (c) 1999 The NetBSD Foundation, Inc. +.\" All rights reserved. +.\" +.\" This code is derived from software contributed to The NetBSD Foundation +.\" by Klaus Klein. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS +.\" ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED +.\" TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR +.\" PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS +.\" BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR +.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS +.\" INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN +.\" CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) +.\" ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE +.\" POSSIBILITY OF SUCH DAMAGE. +.\" +.Dd $Mdocdate$ +.Dt NL 1 +.Os +.Sh NAME +.Nm nl +.Nd line numbering filter +.Sh SYNOPSIS +.Nm +.Op Fl p +.Op Fl b Ar type +.Op Fl d Ar delim +.Op Fl f Ar type +.Op Fl h Ar type +.Op Fl i Ar incr +.Op Fl l Ar num +.Op Fl n Ar format +.Op Fl s Ar sep +.Op Fl v Ar startnum +.Op Fl w Ar width +.Op Ar file +.Sh DESCRIPTION +The +.Nm +utility reads lines from the named +.Ar file +or the standard input if the +.Ar file +argument is omitted, +applies a configurable line numbering filter operation and writes the result +to the standard output. +.Pp +The +.Nm +utility treats the text it reads in terms of logical pages. +Unless specified otherwise, line numbering is reset at the start of each +logical page. +A logical page consists of a header, a body and a footer section; empty +sections are valid. +Different line numbering options are independently available for header, +body and footer sections. +.Pp +The starts of logical page sections are signaled by input lines containing +nothing but one of the following sequences of delimiter characters: +.Bd -unfilled -offset indent +.Bl -column "\e:\e:\e: " "header " +.It Em "Line" "Start of" +.It \e:\e:\e: header +.It \e:\e: body +.It \e: footer +.El +.Ed +.Pp +If the input does not contain any logical page section signaling directives, +the text being read is assumed to consist of a single logical page body. +.Pp +The following options are available: +.Bl -tag -width indent +.It Fl b Ar type +Specify the logical page body lines to be numbered. +Recognized +.Ar type +arguments are: +.Bl -tag -width pstringXX +.It a +Number all lines. +.It t +Number only non-empty lines. +.It n +No line numbering. +.It p Ns Ar expr +Number only those lines that contain the basic regular expression specified +by +.Ar expr . +.El +.Pp +The default +.Ar type +for logical page body lines is t. +.It Fl d Ar delim +Specify the delimiter characters used to indicate the start of a logical +page section in the input file. +At most two characters may be specified; if only one character is specified, +the first character is replaced and the second character remains unchanged. +The default +.Ar delim +characters are ``\e:''. +.It Fl f Ar type +Specify the same as +.Fl b Ar type +except for logical page footer lines. +The default +.Ar type +for logical page footer lines is n. +.It Fl h Ar type +Specify the same as +.Fl b Ar type +except for logical page header lines. +The default +.Ar type +for logical page header lines is n. +.It Fl i Ar incr +Specify the increment value used to number logical page lines. +The default +.Ar incr +value is 1. +.It Fl l Ar num +If numbering of all lines is specified for the current logical section +using the corresponding +.Fl b +a, +.Fl f +a +or +.Fl h +a +option, +specify the number of adjacent blank lines to be considered as one. +For example, +.Fl l +2 results in only the second adjacent blank line being numbered. +The default +.Ar num +value is 1. +.It Fl n Ar format +Specify the line numbering output format. +Recognized +.Ar format +arguments are: +.Bl -tag -width lnXX -compact +.It ln +Left justified. +.It rn +Right justified, leading zeros suppressed. +.It rz +Right justified, leading zeros kept. +.El +.Pp +The default +.Ar format +is rn. +.It Fl p +Specify that line numbering should not be restarted at logical page delimiters. +.It Fl s Ar sep +Specify the characters used in separating the line number and the corresponding +text line. +The default +.Ar sep +setting is a single tab character. +.It Fl v Ar startnum +Specify the initial value used to number logical page lines; see also the +description of the +.Fl p +option. +The default +.Ar startnum +value is 1. +.It Fl w Ar width +Specify the number of characters to be occupied by the line number; +in case the +.Ar width +is insufficient to hold the line number, it will be truncated to its +.Ar width +least significant digits. +The default +.Ar width +is 6. +.El +.Sh EXIT STATUS +.Ex -std +.Sh SEE ALSO +.Xr pr 1 +.Sh STANDARDS +The +.Nm +utility is compliant with the +.St -p1003.1-2008 +specification. +.Sh HISTORY +The +.Nm +utility first appeared in +.At V.2 . Index: nl/nl.c =================================================================== RCS file: nl/nl.c diff -N nl/nl.c --- /dev/null 1 Jan 1970 00:00:00 -0000 +++ nl/nl.c 10 May 2013 14:09:24 -0000 @@ -0,0 +1,384 @@ +/* $OpenBSD$ */ +/* $NetBSD: nl.c,v 1.11 2011/08/16 12:00:46 christos Exp $ */ + +/*- + * Copyright (c) 1999 The NetBSD Foundation, Inc. + * All rights reserved. + * + * This code is derived from software contributed to The NetBSD Foundation + * by Klaus Klein. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS + * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED + * TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR + * PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS + * BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR + * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS + * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN + * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) + * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE + * POSSIBILITY OF SUCH DAMAGE. + */ + +#include <sys/cdefs.h> + +#include <err.h> +#include <limits.h> +#include <locale.h> +#include <regex.h> +#include <stdio.h> +#include <stdlib.h> +#include <string.h> +#include <unistd.h> + +typedef enum { + number_all, /* number all lines */ + number_nonempty, /* number non-empty lines */ + number_none, /* no line numbering */ + number_regex /* number lines matching regular expression */ +} numbering_type; + +struct numbering_property { + const char * const name; /* for diagnostics */ + numbering_type type; /* numbering type */ + regex_t expr; /* for type == number_regex */ +}; + +/* line numbering formats */ +#define FORMAT_LN "%-*d" /* left justified, leading zeros suppressed */ +#define FORMAT_RN "%*d" /* right justified, leading zeros suppressed */ +#define FORMAT_RZ "%0*d" /* right justified, leading zeros kept */ + +#define FOOTER 0 +#define BODY 1 +#define HEADER 2 +#define NP_LAST HEADER + +static struct numbering_property numbering_properties[NP_LAST + 1] = { + { "footer", number_none, { 0, 0, 0, 0 } }, + { "body", number_nonempty, { 0, 0, 0, 0 } }, + { "header", number_none, { 0, 0, 0, 0 } }, +}; + +#define max(a, b) ((a) > (b) ? (a) : (b)) + +/* + * Maximum number of characters required for a decimal representation of a + * (signed) int; courtesy of tzcode. + */ +#define INT_STRLEN_MAXIMUM \ + ((sizeof (int) * CHAR_BIT - 1) * 302 / 1000 + 2) + +static void filter(void); +static void parse_numbering(const char *, int); +static __dead void usage(void); + +/* + * Pointer to dynamically allocated input line buffer, and its size. + */ +static char *buffer; +static size_t buffersize; + +/* + * Dynamically allocated buffer suitable for string representation of ints. + */ +static char *intbuffer; +static size_t intbuffersize; + +/* + * Configurable parameters. + */ +/* delimiter characters that indicate the start of a logical page section */ +static char delim[2] = { '\\', ':' }; + +/* line numbering format */ +static const char *format = FORMAT_RN; + +/* increment value used to number logical page lines */ +static int incr = 1; + +/* number of adjacent blank lines to be considered (and numbered) as one */ +static unsigned int nblank = 1; + +/* whether to restart numbering at logical page delimiters */ +static int restart = 1; + +/* characters used in separating the line number and the corrsp. text line */ +static const char *sep = "\t"; + +/* initial value used to number logical page lines */ +static int startnum = 1; + +/* number of characters to be used for the line number */ +/* should be unsigned but required signed by `*' precision conversion */ +static int width = 6; + + +int +main(int argc, char *argv[]) +{ + int c; + long val; + const char *errstr; + + (void)setlocale(LC_ALL, ""); + + /* + * Note: this implementation strictly conforms to the XBD Utility + * Syntax Guidelines and does not permit the optional `file' operand + * to be intermingled with the options, which is defined in the + * XCU specification (Issue 5) but declared an obsolescent feature that + * will be removed from a future issue. It shouldn't matter, though. + */ + while ((c = getopt(argc, argv, "pb:d:f:h:i:l:n:s:v:w:")) != -1) { + switch (c) { + case 'p': + restart = 0; + break; + case 'b': + parse_numbering(optarg, BODY); + break; + case 'd': + if (optarg[0] != '\0') + delim[0] = optarg[0]; + if (optarg[1] != '\0') + delim[1] = optarg[1]; + /* at most two delimiter characters */ + if (optarg[2] != '\0') { + errx(EXIT_FAILURE, + "invalid delim argument -- %s", + optarg); + /* NOTREACHED */ + } + break; + case 'f': + parse_numbering(optarg, FOOTER); + break; + case 'h': + parse_numbering(optarg, HEADER); + break; + case 'i': + incr = strtonum(optarg, INT_MIN, INT_MAX, &errstr); + if (errstr) + errx(EXIT_FAILURE, "increment value is %s: %s", + errstr, optarg); + break; + case 'l': + nblank = strtonum(optarg, 0, UINT_MAX, &errstr); + if (errstr) + errx(EXIT_FAILURE, + "blank line value is %s: %s", + errstr, optarg); + break; + case 'n': + if (strcmp(optarg, "ln") == 0) { + format = FORMAT_LN; + } else if (strcmp(optarg, "rn") == 0) { + format = FORMAT_RN; + } else if (strcmp(optarg, "rz") == 0) { + format = FORMAT_RZ; + } else + errx(EXIT_FAILURE, + "illegal format -- %s", optarg); + break; + case 's': + sep = optarg; + break; + case 'v': + startnum = strtonum(optarg, INT_MIN, INT_MAX, &errstr); + if (errstr) + errx(EXIT_FAILURE, + "initial logical page value is %s: %s", + errstr, optarg); + break; + case 'w': + width = strtonum(optarg, 1, INT_MAX, &errstr); + if (errstr) + errx(EXIT_FAILURE, "width is %s: %s", errstr, + optarg); + break; + case '?': + default: + usage(); + /* NOTREACHED */ + } + } + argc -= optind; + argv += optind; + + switch (argc) { + case 0: + break; + case 1: + if (strcmp(argv[0], "-") != 0 && + freopen(argv[0], "r", stdin) == NULL) + err(EXIT_FAILURE, "Cannot open `%s'", argv[0]); + break; + default: + usage(); + /* NOTREACHED */ + } + + /* Determine the maximum input line length to operate on. */ + if ((val = sysconf(_SC_LINE_MAX)) == -1) /* ignore errno */ + val = LINE_MAX; + /* Allocate sufficient buffer space (including the terminating NUL). */ + buffersize = (size_t)val + 1; + if ((buffer = malloc(buffersize)) == NULL) + err(EXIT_FAILURE, "Cannot allocate input line buffer"); + + /* Allocate a buffer suitable for preformatting line number. */ + intbuffersize = max((int)INT_STRLEN_MAXIMUM, width) + 1; /* NUL */ + if ((intbuffer = malloc(intbuffersize)) == NULL) + err(EXIT_FAILURE, "cannot allocate preformatting buffer"); + + /* Do the work. */ + filter(); + + return EXIT_SUCCESS; + /* NOTREACHED */ +} + +static void +filter(void) +{ + int line; /* logical line number */ + int section; /* logical page section */ + unsigned int adjblank; /* adjacent blank lines */ + int consumed; /* intbuffer measurement */ + int donumber, idx; + + adjblank = 0; + line = startnum; + section = BODY; + + while (fgets(buffer, (int)buffersize, stdin) != NULL) { + for (idx = FOOTER; idx <= NP_LAST; idx++) { + /* Does it look like a delimiter? */ + if (buffer[2 * idx + 0] == delim[0] && + buffer[2 * idx + 1] == delim[1]) { + /* Was this the whole line? */ + if (buffer[2 * idx + 2] == '\n') { + section = idx; + adjblank = 0; + if (restart) + line = startnum; + goto nextline; + } + } else { + break; + } + } + + switch (numbering_properties[section].type) { + case number_all: + /* + * Doing this for number_all only is disputable, but + * the standard expresses an explicit dependency on + * `-b a' etc. + */ + if (buffer[0] == '\n' && ++adjblank < nblank) + donumber = 0; + else + donumber = 1, adjblank = 0; + break; + case number_nonempty: + donumber = (buffer[0] != '\n'); + break; + case number_none: + donumber = 0; + break; + case number_regex: + donumber = + (regexec(&numbering_properties[section].expr, + buffer, 0, NULL, 0) == 0); + break; + } + + if (donumber) { + consumed = snprintf(intbuffer, intbuffersize, format, + width, line); + (void)printf("%s", + intbuffer + max(0, consumed - width)); + line += incr; + } else { + (void)printf("%*s", width, ""); + } + (void)printf("%s%s", sep, buffer); + + if (ferror(stdout)) + err(EXIT_FAILURE, "output error"); +nextline: + ; + } + + if (ferror(stdin)) + err(EXIT_FAILURE, "input error"); +} + +/* + * Various support functions. + */ + +static void +parse_numbering(const char *argstr, int section) +{ + int error; + char errorbuf[NL_TEXTMAX]; + + switch (argstr[0]) { + case 'a': + numbering_properties[section].type = number_all; + break; + case 'n': + numbering_properties[section].type = number_none; + break; + case 't': + numbering_properties[section].type = number_nonempty; + break; + case 'p': + /* If there was a previous expression, throw it away. */ + if (numbering_properties[section].type == number_regex) + regfree(&numbering_properties[section].expr); + else + numbering_properties[section].type = number_regex; + + /* Compile/validate the supplied regular expression. */ + if ((error = regcomp(&numbering_properties[section].expr, + &argstr[1], REG_NEWLINE|REG_NOSUB)) != 0) { + (void)regerror(error, + &numbering_properties[section].expr, + errorbuf, sizeof (errorbuf)); + errx(EXIT_FAILURE, + "%s expr: %s -- %s", + numbering_properties[section].name, errorbuf, + &argstr[1]); + } + break; + default: + errx(EXIT_FAILURE, + "illegal %s line numbering type -- %s", + numbering_properties[section].name, argstr); + } +} + +static __dead void +usage(void) +{ + extern char *__progname; + + (void)fprintf(stderr, "usage: %s [-p] [-b type] [-d delim] [-f type] " + "[-h type] [-i incr] [-l num]\n\t[-n format] [-s sep] " + "[-v startnum] [-w width] [file]\n", __progname); + exit(EXIT_FAILURE); +}