Dan Jacobson <[EMAIL PROTECTED]> writes: > $ echo a|sort -o x -o y > $ ls > y
POSIX allows this behavior, but it's admittedly weird. I think that option order should not matter, unless POSIX or the documentation explicitly says otherwise. So I propose the following patch. While looking into this problem I noticed that sort's -t option doesn't let you specify a NUL as a field separator (this is a related issue since 'sort' uses 0 to represent "no option specified yet"). Also, the documentation and usage strings incorrectly say "white space" several places where they should say "blanks". Here's a patch for these problems. 2003-09-02 Paul Eggert <[EMAIL PROTECTED]> * NEWS: sort -t '\0' now uses a NUL tab. sort option order no longer matters, unless POSIX requires it. * doc/coreutils.texi (sort invocation): -d now overrides -i. "whitespace" -> "blanks"; "whitespace" isn't correct. -t '\0' now specifies a NUL tab. * src/sort.c (usage): Say "blanks" instead of "whitespace", Similar fixes for many comments. (TAB_DEFAULT): New constant, so that we can support NUL as the field separator. (tab): Now int, not char. Initialize to TAB_DEFAULT. (specify_sort_size): If multiple sizes are specified, use the largest. (begfield, limfield): Support NUL tab char. (set_ordering): Do not let -i override -d. (main): Report an error if incompatible -o or -t options are given. Report an error for "-t ''". Allow "-t '\0'" to specify a NUL tab. Index: NEWS =================================================================== RCS file: /cvsroot/coreutils/coreutils/NEWS,v retrieving revision 1.124 diff -p -u -r1.124 NEWS --- NEWS 27 Aug 2003 09:18:28 -0000 1.124 +++ NEWS 2 Sep 2003 22:50:50 -0000 @@ -13,6 +13,12 @@ GNU coreutils NEWS timestamps to their full nanosecond resolution; microsecond resolution is the best we can do right now. + sort now supports the zero byte (NUL) as a field separator; use -t '\0'. + The -t '' option, which formerly had no effect, is now an error. + + sort option order no longer matters for the options -S, -d, -i, -o, and -t. + Stronger options override weaker, and incompatible options are diagnosed. + ** Bug fixes stat no longer overruns a buffer for format strings ending in `%' Index: doc/coreutils.texi =================================================================== RCS file: /cvsroot/coreutils/coreutils/doc/coreutils.texi,v retrieving revision 1.130 diff -p -u -r1.130 coreutils.texi --- doc/coreutils.texi 17 Aug 2003 17:10:25 -0000 1.130 +++ doc/coreutils.texi 2 Sep 2003 22:51:09 -0000 @@ -2969,6 +2969,8 @@ converting to floating point. @vindex LC_CTYPE Ignore nonprinting characters. The @env{LC_CTYPE} locale determines character types. +This option has no effect if the stronger @option{--dictionary-order} +(@option{-d}) option is also given. @item -M @itemx --month-sort @@ -2976,7 +2978,7 @@ The @env{LC_CTYPE} locale determines cha @opindex --month-sort @cindex months, sorting by @vindex LC_TIME -An initial string, consisting of any amount of whitespace, followed +An initial string, consisting of any amount of blanks, followed by a month name abbreviation, is folded to UPPER case and compared in the order @samp{JAN} < @samp{FEB} < @dots{} < @samp{DEC}. Invalid names compare low to valid names. The @env{LC_TIME} locale @@ -2989,7 +2991,7 @@ category determines the month spellings. @cindex numeric sort @vindex LC_NUMERIC Sort numerically: the number begins each line; specifically, it consists -of optional whitespace, an optional @samp{-} sign, and zero or more +of optional blanks, an optional @samp{-} sign, and zero or more digits possibly separated by thousands separators, optionally followed by a decimal-point character and zero or more digits. The @env{LC_NUMERIC} locale specifies the decimal-point character and thousands separator. @@ -3085,7 +3087,7 @@ than @var{size}. @cindex field separator character Use character @var{separator} as the field separator when finding the sort keys in each line. By default, fields are separated by the empty -string between a non-whitespace character and a whitespace character. +string between a non-blank character and a blank character. That is, given the input line @[EMAIL PROTECTED] foo bar}}, @command{sort} breaks it into fields @[EMAIL PROTECTED] foo}} and @[EMAIL PROTECTED] bar}}. The field separator is not considered to be part of either the field preceding or the field @@ -3093,6 +3095,10 @@ following. But note that sort fields th as @option{-k 2}, or sort fields consisting of a range, as @option{-k 2,3}, retain the field separators present between the endpoints of the range. +To specify a zero byte (@acronym{ASCII} @sc{nul} (Null) character) as +the field separator, use the two-character string @samp{\0}, e.g., [EMAIL PROTECTED] -t '\0'}. + @item -T @var{tempdir} @itemx [EMAIL PROTECTED] @opindex -T @@ -3218,7 +3224,7 @@ field-end part of the key specifier. @item Sort the password file on the fifth field and ignore any -leading white space. Sort lines with equal values in field five +leading blanks. Sort lines with equal values in field five on the numeric user ID in field three. @example @@ -3242,7 +3248,7 @@ The use of @option{-print0}, @option{-z} that pathnames that contain Line Feed characters will not get broken up by the sort operation. -Finally, to ignore both leading and trailing white space, you +Finally, to ignore both leading and trailing blanks, you could have applied the @samp{b} modifier to the field-end specifier for the first key, Index: src/sort.c =================================================================== RCS file: /cvsroot/coreutils/coreutils/src/sort.c,v retrieving revision 1.267 diff -p -u -r1.267 sort.c --- src/sort.c 4 Aug 2003 08:55:44 -0000 1.267 +++ src/sort.c 2 Sep 2003 22:56:17 -0000 @@ -146,8 +146,8 @@ struct keyfield size_t echar; /* Additional characters in field. */ bool const *ignore; /* Boolean array of characters to ignore. */ char const *translate; /* Translation applied to characters. */ - bool skipsblanks; /* Skip leading white space at start. */ - bool skipeblanks; /* Skip trailing white space at finish. */ + bool skipsblanks; /* Skip leading blanks at start. */ + bool skipeblanks; /* Skip trailing blanks at finish. */ bool numeric; /* Flag for numeric comparison. Handle strings of digits with optional decimal point, but no exponential notation. */ @@ -173,7 +173,7 @@ char *program_name; internally, but doing this with good performance is a bit tricky. */ -/* Table of white space. */ +/* Table of blanks. */ static bool blanks[UCHAR_LIM]; /* Table of non-printing characters. */ @@ -243,10 +243,13 @@ static bool reverse; they were read if all keys compare equal. */ static bool stable; -/* Tab character separating fields. If NUL, then fields are separated - by the empty string between a non-whitespace character and a whitespace +/* If TAB has this value, blanks separate fields. */ +enum { TAB_DEFAULT = CHAR_MAX + 1 }; + +/* Tab character separating fields. If TAB_DEFAULT, then fields are + separated by the empty string between a non-blank character and a blank character. */ -static char tab; +static int tab = TAB_DEFAULT; /* Flag to remove consecutive duplicate lines from the output. Only the last of a sequence of equal lines will be output. */ @@ -305,7 +308,7 @@ Other options:\n\ -S, --buffer-size=SIZE use SIZE for main memory buffer\n\ "), stdout); printf (_("\ - -t, --field-separator=SEP use SEP instead of non- to whitespace transition\n\ + -t, --field-separator=SEP use SEP instead of non-blank to blank transition\n\ -T, --temporary-directory=DIR use DIR for temporaries, not $TMPDIR or %s\n\ multiple options specify multiple directories\n\ -u, --unique with -c: check for strict ordering\n\ @@ -618,6 +621,11 @@ specify_sort_size (char const *s) if (e == LONGINT_OK) { + /* If multiple sort sizes are specified, take the maximum, so + that option order does not matter. */ + if (n < sort_size) + return; + sort_size = n; if (sort_size == n) { @@ -769,7 +777,7 @@ begfield (const struct line *line, const /* The leading field separator itself is included in a field when -t is absent. */ - if (tab) + if (tab != TAB_DEFAULT) while (ptr < lim && sword--) { while (ptr < lim && *ptr != tab) @@ -817,7 +825,7 @@ limfield (const struct line *line, const `beginning' is the first character following the delimiting TAB. Otherwise, leave PTR pointing at the first `blank' character after the preceding field. */ - if (tab) + if (tab != TAB_DEFAULT) while (ptr < lim && eword--) { while (ptr < lim && *ptr != tab) @@ -866,7 +874,7 @@ limfield (const struct line *line, const */ /* Make LIM point to the end of (one byte past) the current field. */ - if (tab) + if (tab != TAB_DEFAULT) { char *newlim; newlim = memchr (ptr, tab, lim - ptr); @@ -2159,7 +2167,10 @@ set_ordering (register const char *s, st key->general_numeric = true; break; case 'i': - key->ignore = nonprinting; + /* Option order should not matter, so don't let -i override + -d. -d implies -i, but -i does not imply -d. */ + if (! key->ignore) + key->ignore = nonprinting; break; case 'M': key->month = true; @@ -2428,6 +2439,8 @@ main (int argc, char **argv) break; case 'o': + if (outfile != minus && strcmp (outfile, optarg) != 0) + error (SORT_FAILURE, 0, _("multiple output files specified")); outfile = optarg; break; @@ -2440,15 +2453,28 @@ main (int argc, char **argv) break; case 't': - tab = optarg[0]; - if (tab && optarg[1]) - { - /* Provoke with `sort -txx'. Complain about - "multi-character tab" instead of "multibyte tab", so - that the diagnostic's wording does not need to be - changed once multibyte characters are supported. */ - error (SORT_FAILURE, 0, _("multi-character tab `%s'"), optarg); - } + { + int newtab = optarg[0]; + if (! newtab) + error (SORT_FAILURE, 0, _("empty tab")); + if (optarg[1]) + { + if (strcmp (optarg, "\\0") == 0) + newtab = '\0'; + else + { + /* Provoke with `sort -txx'. Complain about + "multi-character tab" instead of "multibyte tab", so + that the diagnostic's wording does not need to be + changed once multibyte characters are supported. */ + error (SORT_FAILURE, 0, _("multi-character tab `%s'"), + optarg); + } + } + if (tab != TAB_DEFAULT && tab != newtab) + error (SORT_FAILURE, 0, _("incompatible tabs")); + tab = newtab; + } break; case 'T': _______________________________________________ Bug-coreutils mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/bug-coreutils