Inspired by Paul Goyette's question (on netbsd-users) I took a look at sort, and I'd like to commit the following updates if no-one objects.
The only changes that should affect anything are the addition of the posix C option, which is identical to c, but doesn't write messages to stderr if the input file is not sorted, and fixing bugs in the processing of -R such that if -R is used (and without setting it to \n - the processing of which is also fixed in case it is set that wat using -R 10) \n does not become a field separator regardless of what might be set later with -t (if -t preceded -R it would have worked correctly, but not the other way.) Aside from that the changes are more or less cosmetic - they enforce using only one of -c -C and -m (which make no sense used together), make the usage() reflect reality (including formatting it to stop assuming it is outputting to an 80 column display..., and reflect the man page changes mentioned next), and fix a minor bug in a comment, removed the unused 'x' option (what was that?) from SORT_OPTS (no effect, generates usage() either way) and sorted the option processing (R comes before S...) In the man page, -C is documented, the synopsis is split to show the (only one file allowed) different usage for -C or -c, and perhaps most importantly, the names "field1" and "filed2" are changed to "kstart" and "kend" to make it (a little more) clear that the -k argument does not specify or use a field as such, but designates the start and end of the sort keys (with the designators using fields as an addressing object - which is all fields are used for in sort, unlike awk, cut, etc.) and -R is fully documented. There are no changes (at all) to anything actually related to sorting... Anyone object to these changes? (patch appended) kre Index: msort.c =================================================================== RCS file: /cvsroot/src/usr.bin/sort/msort.c,v retrieving revision 1.30 diff -u -r1.30 msort.c --- msort.c 5 Feb 2010 21:58:42 -0000 1.30 +++ msort.c 29 May 2016 05:00:34 -0000 @@ -365,7 +365,7 @@ * check order on one file */ void -order(struct filelist *filelist, struct field *ftbl) +order(struct filelist *filelist, struct field *ftbl, int quiet) { get_func_t get = SINGL_FLD ? makeline : makekey; RECHEADER *crec, *prec, *trec; @@ -387,10 +387,14 @@ exit(0); while (get(fp, crec, crec_end, ftbl) == 0) { if (0 < (c = cmp(prec, crec))) { + if (quiet) + exit(1); crec->data[crec->length-1] = 0; errx(1, "found disorder: %s", crec->data+crec->offset); } if (UNIQUE && !c) { + if (quiet) + exit(1); crec->data[crec->length-1] = 0; errx(1, "found non-uniqueness: %s", crec->data+crec->offset); Index: sort.1 =================================================================== RCS file: /cvsroot/src/usr.bin/sort/sort.1,v retrieving revision 1.34 diff -u -r1.34 sort.1 --- sort.1 29 May 2013 15:00:35 -0000 1.34 +++ sort.1 29 May 2016 05:00:34 -0000 @@ -67,16 +67,26 @@ .Nd sort or merge text files .Sh SYNOPSIS .Nm -.Op Fl bcdfHilmnrSsu +.Op Fl bdfHilmnrSsu .Oo .Fl k -.Ar field1 Ns Op Li \&, Ns Ar field2 +.Ar kstart Ns Op Li \&, Ns Ar kend .Oc .Op Fl o Ar output .Op Fl R Ar char .Op Fl T Ar dir .Op Fl t Ar char .Op Ar +.Nm +.Fl c|C +.Op Fl bdfilnru +.Oo +.Fl k +.Ar kstart Ns Op Li \&, Ns Ar kend +.Op Fl t Ar char +.Oc +.Op Fl R Ar char +.Op Ar file .Sh DESCRIPTION The .Nm @@ -101,6 +111,10 @@ produces no output. See also .Fl u . +.It Fl C +Identical to +.Fl c +without the error messages in the case of unsorted input. .It Fl H Ignored for compatibility with earlier versions of .Nm . @@ -137,9 +151,13 @@ option, check that there are no lines with duplicate keys. .El .Pp -The following options override the default ordering rules. -When ordering options appear independent of key field -specifications, the requested field ordering rules are +The following options, +which should be given before any +.Fl k +options, override the default ordering rules. +When ordering options appear independent of, +and before, key field specifications, +the requested field ordering rules are applied globally to all sort keys. When attached to a specific key (see .Fl k ) , @@ -224,12 +242,21 @@ This should be used with discretion; .Fl R Aq Ar alphanumeric usually produces undesirable results. +If char is not a single character, then it +specifies the value of the desired record +separator as an integer specified in any +of the normal NNN, 0ooo, or 0xXXX ways, +or as an octal value preceded by \e. +Caution: do not attempt to specify Ctl-A +as +.Dq -R 1 +which will not do what was intended at all! The default record separator is newline. -.It Fl k Ar field1 Ns Op Li \&, Ns Ar field2 +.It Fl k Ar kstart Ns Op Li \&, Ns Ar kend Designates the starting position, -.Ar field1 , +.Ar kstart , and optional ending position, -.Ar field2 , +.Ar kend , of a key field. The .Fl k @@ -265,16 +292,16 @@ Fields are specified by the .Fl k -.Ar field1 Ns Op \&, Ns Ar field2 +.Ar kstart Ns Op \&, Ns Ar kend argument. A missing -.Ar field2 +.Ar kend argument defaults to the end of a line. .Pp The arguments -.Ar field1 +.Ar kstart and -.Ar field2 +.Ar kend have the form .Ar m Ns Li \&. Ns Ar n and can be followed by one or more of the letters @@ -284,7 +311,7 @@ .Cm r , which correspond to the options discussed above. A -.Ar field1 +.Ar kstart position specified by .Ar m Ns Li \&. Ns Ar n .Pq Ar m , n No \*[Gt] 0 @@ -296,7 +323,7 @@ A missing .Li \&. Ns Ar n in -.Ar field1 +.Ar kstart means .Ql \&.1 , indicating the first character of the @@ -314,7 +341,7 @@ field. .Pp A -.Ar field2 +.Ar kend position specified by .Ar m Ns Li \&. Ns Ar n is interpreted as @@ -451,7 +478,7 @@ Thus performance depends highly on efficient choice of sort keys, and the .Fl b option and the -.Ar field2 +.Ar kend argument of the .Fl k option should be used whenever possible. Index: sort.c =================================================================== RCS file: /cvsroot/src/usr.bin/sort/sort.c,v retrieving revision 1.61 diff -u -r1.61 sort.c --- sort.c 16 Sep 2011 15:39:29 -0000 1.61 +++ sort.c 29 May 2016 05:00:34 -0000 @@ -117,7 +117,7 @@ main(int argc, char *argv[]) { int ch, i, stdinflag = 0; - char cflag = 0, mflag = 0; + char mode = 0; char *outfile, *outpath = 0; struct field *fldtab; size_t fldtab_sz, fld_cnt; @@ -145,9 +145,9 @@ fldtab = emalloc(fldtab_sz * sizeof(*fldtab)); memset(fldtab, 0, fldtab_sz * sizeof(*fldtab)); -#define SORT_OPTS "bcdD:fHik:lmno:rR:sSt:T:ux" +#define SORT_OPTS "bcCdD:fHik:lmno:rR:sSt:T:u" - /* Convert "+field" args to -f format */ + /* Convert "+field" args to -k format */ fixit(&argc, argv, SORT_OPTS); if (!(tmpdir = getenv("TMPDIR"))) @@ -158,8 +158,10 @@ case 'b': fldtab[0].flags |= BI | BT; break; - case 'c': - cflag = 1; + case 'c': case 'C': case 'm': + if (mode) + usage("Incompatible operation modes"); + mode = ch; break; case 'D': /* Debug flags */ for (i = 0; optarg[i]; i++) @@ -179,15 +181,33 @@ setfield(optarg, &fldtab[++fld_cnt], fldtab[0].flags); break; - case 'm': - mflag = 1; - break; case 'o': outpath = optarg; break; case 'r': REVERSE = 1; break; + case 'R': + if (REC_D != '\n') + usage("multiple record delimiters"); + REC_D = *optarg; + if (optarg[1] != '\0') { + char *ep; + int t = 0; + + if (optarg[0] == '\\') + optarg++, t = 8; + REC_D = (int)strtol(optarg, &ep, t); + if (*ep != '\0' || REC_D < 0 || + REC_D >= (int)__arraycount(d_mask)) + errx(2, "invalid record delimiter %s", + optarg); + } + if (REC_D == '\n') + break; + d_mask['\n'] = d_mask[' ']; + d_mask[REC_D] = REC_D_F; + break; case 's': /* * Nominally 'stable sort', keep lines with equal keys @@ -213,30 +233,11 @@ SEP_FLAG = 1; d_mask[' '] &= ~FLD_D; d_mask['\t'] &= ~FLD_D; + d_mask['\n'] &= ~FLD_D; d_mask[(u_char)*optarg] |= FLD_D; if (d_mask[(u_char)*optarg] & REC_D_F) errx(2, "record/field delimiter clash"); break; - case 'R': - if (REC_D != '\n') - usage("multiple record delimiters"); - REC_D = *optarg; - if (REC_D == '\n') - break; - if (optarg[1] != '\0') { - char *ep; - int t = 0; - if (optarg[0] == '\\') - optarg++, t = 8; - REC_D = (int)strtol(optarg, &ep, t); - if (*ep != '\0' || REC_D < 0 || - REC_D >= (int)__arraycount(d_mask)) - errx(2, "invalid record delimiter %s", - optarg); - } - d_mask['\n'] = d_mask[' ']; - d_mask[REC_D] = REC_D_F; - break; case 'T': /* -T tmpdir */ tmpdir = optarg; @@ -254,13 +255,13 @@ /* Don't sort on raw record if keys match */ posix_sort = 0; - if (cflag && argc > optind+1) + if ((mode == 'c' || mode == 'C') && argc > optind+1) errx(2, "too many input files for -c option"); if (argc - 2 > optind && !strcmp(argv[argc-2], "-o")) { outpath = argv[argc-1]; argc -= 2; } - if (mflag && argc - optind > (MAXFCT - (16+1))*16) + if (mode == 'm' && argc - optind > (MAXFCT - (16+1))*16) errx(2, "too many input files for -m option"); for (i = optind; i < argc; i++) { @@ -309,8 +310,8 @@ num_input_files = argc - optind; } - if (cflag) { - order(&filelist, fldtab); + if (mode == 'c' || mode == 'C') { + order(&filelist, fldtab, mode == 'C'); /* NOT REACHED */ } @@ -348,7 +349,7 @@ err(2, "output file %s", outfile); } - if (mflag) + if (mode == 'm') fmerge(&filelist, num_input_files, outfp, fldtab); else fsort(&filelist, num_input_files, outfp, fldtab); @@ -393,13 +394,20 @@ static void usage(const char *msg) { + const char *pn = getprogname(); + if (msg != NULL) (void)fprintf(stderr, "%s: %s\n", getprogname(), msg); (void)fprintf(stderr, - "usage: %s [-bcdfHilmnrSsu] [-k field1[,field2]] [-o output]" - " [-R char] [-T dir]", getprogname()); + "usage: %s [-bdfHilmnrSsu] [-k kstart[,kend]] [-o output]" + " [-R char] [-T dir]\n", pn); (void)fprintf(stderr, " [-t char] [file ...]\n"); + (void)fprintf(stderr, + " or: %s -[cC] [-bdfilnru] [-k kstart[,kend]] [-o output]" + " [-R char]\n", pn); + (void)fprintf(stderr, + " [-t char] [file]\n"); exit(2); } Index: sort.h =================================================================== RCS file: /cvsroot/src/usr.bin/sort/sort.h,v retrieving revision 1.35 diff -u -r1.35 sort.h --- sort.h 5 Aug 2015 07:10:03 -0000 1.35 +++ sort.h 29 May 2016 05:00:34 -0000 @@ -191,7 +191,7 @@ int makeline(FILE *, RECHEADER *, u_char *, struct field *); void makeline_copydown(RECHEADER *); int optval(int, int); -__dead void order(struct filelist *, struct field *); +__dead void order(struct filelist *, struct field *, int); void putline(const RECHEADER *, FILE *); void putrec(const RECHEADER *, FILE *); void putkeydump(const RECHEADER *, FILE *);