On Tue, Jan 12, 2010 at 12:42 PM, J.C. Roberts <list-...@designtools.org> wrote: > On Mon, 11 Jan 2010 13:13:44 -0800 Philip Guenther <guent...@gmail.com> > wrote: >> On Mon, Jan 11, 2010 at 1:01 AM, J.C. Roberts >> <list-...@designtools.org> wrote: >> > Below is a "test case" ksh shell script to show examples of very >> > minor omissions in the ksh(1) man page regarding testing for zero >> > length and non-zero length. A patch for ksh.1 is below. >> > >> > In the script, you'll need to manually fix the assignment of $foo >> > for test set #2, since sending a raw null through email is a major >> > pain. I have it set below as text '^@' but you'll need to replace >> > it with a real null (CTRL-V CTRL-@). >> >> NULs are always silently dropped from strings by the shell, so test >> this part of your script duplicates the part where foo is the empty >> string. > > Ah, good to know. Thanks. The reason I went out of my way to > specifically test NULL is due to the wording in the standard. > > # http://www.opengroup.org/onlinepubs/9699919799/utilities/test.html > # > # string > # True if the string string is not the null string; otherwise, false > > The wording for the ksh(1) man page says ``[ string ]'' tests for > non-zero length, so this gets into how NULL is defined... --until one > learns they've been stripped. ;-)
Ah, the glory of terms that are defined differently in different places. In the POSIX standard, per the "Base Definitions" part of the standard, "null string" *means* "empty string". As for actual NUL bytes in a shell variable value, there's no specified way to get one there. A NUL byte in a script means the input isn't a text file, and probably violates the grammar. A NUL byte in a command expansion specifically results in unspecified behavior (XCU 2.6.3). 'read' is only required to accept text input. And so on. Sorry, no binary data here. ... >> (As a side-note, both the existing "Note" and your proposed addition >> are wrong about the unquoted constructs having problems when the >> variable's value is '!' or '-n'. It's only if it's unset, empty, or >> contains characters from $IFS that a problem occurs. Some other OSes >> (*cough* Solaris *cough*) have versions of /bin/sh that can choke on >> those, but OpenBSD's follows the rules there.) > > On OpenBSD, with ``test ...'' and ``[...]'' the behavior of unary > operators stored within an operand makes sense, but only if you think > it through completely to realize the field separation of operand > strings exists to *ENABLE* you to stuff other operators or operands > into the test. Hmm, that doesn't jive with my understanding of history. 'test' and '[' were separate programs for a while before any one built them into the shell. When they were (or are!) just programs, they had/have no access to the 'original' form of the command line. All 'test' sees is an array of character strings that, in practice, were the result of variable expansion and field-splitting by the shell. So it's not like someone said "I know, let's perform variable expansion on *all* the arguments to 'test' so that you can control the operations performed with variable!" That was just fall out of the original "need some way to do tests from the shell; I'll write a separate program to do that". ... > Perhaps a more accurate description like, "unary followed by IFS" would > be better? ...which is a tiny subset of the already-known-to-be-a-problem "contains IFS at all" values. > I'm still trying to fully understand the behavior of strings containing > a newline <sigh> and I need to poke at it a bit more, but I think having > at least some mention of behavior of tests when operands contain a > newline is an improvement. There's nothing inherently magical about newlines in strings...other than the fact that it's part of the default value of IFS. If you set IFS to not contain newline then it's just a plain character in expansions: IFS=' ' foo='foo bar' [ $foo != blah ] && echo true # no syntax error here! set -- $foo echo "foo split into $# word(s)" ... >> While directing people to use [[...]] might be okay for the ksh(1) >> manpage, it won't fly for the sh(1) manpage. Simply saying "quote >> your arguments!" on both would be simpler. > > Excellent Point! > > The more provocative question is, "Why would you want field splitting > of operands in the [...] tests?" > > Yes, I know the field splitting exists so you can feed various operators > and operands into the test, No! It exists because test is just a command! 'test' sees *exactly* the same argument handling by the shell as any other normal command! > but I have a hunch the [[...]] syntax was > added to ksh specifically because someone noticed the field splitting > stuff typically does more harm than good if you're not thinking it > through. Of course, using the ksh [[...]] feature makes scripts less > portable between shells, so quoting operands is a better answer. > > Do you think the addition of the [[...]] syntax in ksh was due to > misunderstanding of field splitting (failure to quote), prevention of > less than easy to understand syntax like: > > ``[ -o foo -o -o !foo ]'' # From the ksh man page. > > and similar? Trust your hunch. I'm 100% certain that David Korn had no misunderstandings at all about the uses and misuses of field-splitting and grammar interpretation when he added [[...]] to his shell. For example: https://www.opengroup.org/sophocles/show_mail.tpl?CALLER=show_archive.tpl &source=L&listname=austin-group-l&id=9010 As for quoting being a "better answer", that depends on how you're measuring goodness. If strict portability is a MUST, then sure, [[...]] is ruled out, but it's not the only answer. For example, depending on the script I might be tempted to do "unset IFS; set -f" at the start and only quote literal shell meta-characters (including whitespace). If you do that, the visual syntax will match the 'expanded' syntax. Then splitting and file-globbing can be contained in functions which reset IFS or do 'set +f' for just their duration. c.f. David again at http://www.opengroup.org/austin/mailarchives/ag/msg08413.html If strict portability isn't a MUST, then [[...]] is often "better" than [...]. About the only exception might be code that puts 'test' operators in variables, but that's rare and can be rewritten without too much effort. Such code is often difficult to understand anyway; flexibility can be a Bad Thing... Philip Guenther