Ugh.

So we all know that there's this syntax for formatting codes (nés "interior
sequences") like C<< x >>.
And that tokenizes as three tokens:
  "C<< ",   open-C code
  "x",      content
  " >>"     close-code matching the C open-code


And this is explicated by what I wrote in perlpodspec where I say that such
a code...

* starts with a capital letter (just US-ASCII [A-Z]) followed by two or
more "<"'s, one or more whitespace characters,

* any number of characters

* one or more whitespace characters, and ending with the first matching
sequence of two or more ">"'s, where the number of ">"'s equals the number
of "<"'s in the opening of this formatting code.


But I do not remember putting /immense/ thought into the question of
whether the content should consist of "any" (0+) number of characters or
"some" (1+) number of characters.
And now I'm beginning to wonder about two problems that occur when a C<< >>
code is empty (corresponding to an XML "<C></C>").

Notably, those problems are:
  How should C<<  >> tokenize?
And:
  How should C<< >> tokenize?

I see two possibilities:

* a C start-code
* empty-string content
* an end-code matching the C start-code

or:

* a C start-code (consisting of the C<< and all the subsequent whitespace)
* a literal ">>"



I'm tempted to just stipulate that codes with the syntax like C<< ... >>
must not be empty, which pretty much allows the latter tokenziation in both
cases.

First, there's the completely obvious argument that C<< ... >> codes were
devised specifically to handle the cases where the intended content
contained a literal ">", as on C<< $foo->bar >>, so using them with
no-content is daffy.
Secondly, empty codes in general (whether C<< ... >> or C<...>) in Pod are
of extremely dubious value -- I can't imagine why one would want a B<> or a
I<> or a C<> or a F<> or an S<>, much less why one would want to use B<< >>
syntax for them.
That's even beside all the implementational hassles that can lurk in trying
to make a tokenizer deal with C<< >> and/or C<<  >> as if either/both were
C<< Z<> >>!


--
Sean M. Burke    [EMAIL PROTECTED]    http://www.spinn.net/~sburke/

Reply via email to