On Tue, Jan 12, 2010 at 12:42 PM, J.C. Roberts <list-...@designtools.org>
wrote:
> On Mon, 11 Jan 2010 13:13:44 -0800 Philip Guenther <guent...@gmail.com>
> wrote:
>> On Mon, Jan 11, 2010 at 1:01 AM, J.C. Roberts
>> <list-...@designtools.org> wrote:
>> > Below is a "test case" ksh shell script to show examples of very
>> > minor omissions in the ksh(1) man page regarding testing for zero
>> > length and non-zero length. A patch for ksh.1 is below.
>> >
>> > In the script, you'll need to manually fix the assignment of $foo
>> > for test set #2, since sending a raw null through email is a major
>> > pain. I have it set below as text '^@' but you'll need to replace
>> > it with a real null (CTRL-V CTRL-@).
>>
>> NULs are always silently dropped from strings by the shell, so test
>> this part of your script duplicates the part where foo is the empty
>> string.
>
> Ah, good to know. Thanks. The reason I went out of my way to
> specifically test NULL is due to the wording in the standard.
>
> # http://www.opengroup.org/onlinepubs/9699919799/utilities/test.html
> #
> # string
> #     True if the string string is not the null string; otherwise, false
>
> The wording for the ksh(1) man page says ``[ string ]'' tests for
> non-zero length, so this gets into how NULL is defined... --until one
> learns they've been stripped. ;-)

Ah, the glory of terms that are defined differently in different
places.  In the POSIX standard, per the "Base Definitions" part of the
standard, "null string" *means* "empty string".

As for actual NUL bytes in a shell variable value, there's no
specified way to get one there.  A NUL byte in a script means the
input isn't a text file, and probably violates the grammar.  A NUL
byte in a command expansion specifically results in unspecified
behavior (XCU 2.6.3).  'read' is only required to accept text input.
And so on.  Sorry, no binary data here.

...
>> (As a side-note, both the existing "Note" and your proposed addition
>> are wrong about the unquoted constructs having problems when the
>> variable's value is '!' or '-n'.  It's only if it's unset, empty, or
>> contains characters from $IFS that a problem occurs.  Some other OSes
>> (*cough* Solaris *cough*) have versions of /bin/sh that can choke on
>> those, but OpenBSD's follows the rules there.)
>
> On OpenBSD, with ``test ...'' and ``[...]'' the behavior of unary
> operators stored within an operand makes sense, but only if you think
> it through completely to realize the field separation of operand
> strings exists to *ENABLE* you to stuff other operators or operands
> into the test.

Hmm, that doesn't jive with my understanding of history.  'test' and
'[' were separate programs for a while before any one built them into
the shell.  When they were (or are!) just programs, they had/have no
access to the 'original' form of the command line.  All 'test' sees is
an array of character strings that, in practice, were the result of
variable expansion and field-splitting by the shell.  So it's not like
someone said "I know, let's perform variable expansion on *all* the
arguments to 'test' so that you can control the operations performed
with variable!"  That was just fall out of the original "need some way
to do tests from the shell; I'll write a separate program to do that".


...
> Perhaps a more accurate description like, "unary followed by IFS" would
> be better?

...which is a tiny subset of the already-known-to-be-a-problem
"contains IFS at all" values.


> I'm still trying to fully understand the behavior of strings containing
> a newline <sigh> and I need to poke at it a bit more, but I think having
> at least some mention of behavior of tests when operands contain a
> newline is an improvement.

There's nothing inherently magical about newlines in strings...other
than the fact that it's part of the default value of IFS.  If you set
IFS to not contain newline then it's just a plain character in
expansions:

IFS=' '
foo='foo
bar'
[ $foo != blah ] && echo true    # no syntax error here!
set -- $foo
echo "foo split into $# word(s)"


...
>> While directing people to use [[...]] might be okay for the ksh(1)
>> manpage, it won't fly for the sh(1) manpage.  Simply saying "quote
>> your arguments!" on both would be simpler.
>
> Excellent Point!
>
> The more provocative question is, "Why would you want field splitting
> of operands in the [...] tests?"
>
> Yes, I know the field splitting exists so you can feed various operators
> and operands into the test,

No!  It exists because test is just a command!  'test' sees *exactly*
the same argument handling by the shell as any other normal command!


> but I have a hunch the [[...]] syntax was
> added to ksh specifically because someone noticed the field splitting
> stuff typically does more harm than good if you're not thinking it
> through. Of course, using the ksh [[...]] feature makes scripts less
> portable between shells, so quoting operands is a better answer.
>
> Do you think the addition of the [[...]] syntax in ksh was due to
> misunderstanding of field splitting (failure to quote), prevention of
> less than easy to understand syntax like:
>
>        ``[ -o foo -o -o !foo ]''    # From the ksh man page.
>
> and similar?

Trust your hunch.  I'm 100% certain that David Korn had no
misunderstandings at all about the uses and misuses of field-splitting
and grammar interpretation when he added [[...]] to his shell.  For
example:
    https://www.opengroup.org/sophocles/show_mail.tpl?CALLER=show_archive.tpl
&source=L&listname=austin-group-l&id=9010


As for quoting being a "better answer", that depends on how you're
measuring goodness.  If strict portability is a MUST, then sure,
[[...]] is ruled out, but it's not the only answer.  For example,
depending on the script I might be tempted to do "unset IFS; set -f"
at the start and only quote literal shell meta-characters (including
whitespace).  If you do that, the visual syntax will match the
'expanded' syntax.  Then splitting and file-globbing can be contained
in functions which reset IFS or do 'set +f' for just their duration.
c.f. David again at
    http://www.opengroup.org/austin/mailarchives/ag/msg08413.html

If strict portability isn't a MUST, then [[...]] is often "better"
than [...].  About the only exception might be code that puts 'test'
operators in variables, but that's rare and can be rewritten without
too much effort.  Such code is often difficult to understand anyway;
flexibility can be a Bad Thing...


Philip Guenther

Reply via email to