On Fri, Sep 07, 2007 at 02:45:52AM -0500, Patrick R. Michaud wrote:
: On Thu, Sep 06, 2007 at 05:12:03PM -0700, [EMAIL PROTECTED] wrote:
: > Log:
: > old <?foo> is now <+foo> to suppress capture
: > new <?foo> now is zero-width like <!foo>
:
: I really like the change from <?foo> to <+foo>, but I think there's
: a conflict (or at least some confusion) in the way the new spec is
: worded, especially as it relates to character class sets.
I'm actually still of two minds whether it's proper to overload <+foo>
like that, and what we end up with may well depend on revisions to the
binding syntax. But it can be <+foo> for now, assuming we can deal with
the ambiguities you point out. 'Course, by the time we're done with
that, we might well decide <+foo> is a bad plan...
: Both old and new versions of S05 say:
:
: If the first character after the identifier is whitespace, the
: subsequent text (following any whitespace) is passed as a regex,
: so <foo bar> is more or less equivalent to <foo(/bar/)> .
:
: In the previous version of S05, the non-capturing form of <foo bar>
: would be <?foo bar>. Here, the whitespace after "foo" indicated
: that "bar" was to be parsed and passed to foo as a regex.
:
: In the new version of S05, the non-capturing form of <foo bar>
: would seem to be <+foo bar>. Okay, I can handle that. However,
: S05 also says that " <foo+bar-baz> can be written as <+ foo + bar - baz> ".
: Presumably this second form would also allow "<+foo + bar - baz>",
: which seems to conflict slightly with the notion that <+foo bar>
: is the non-capturing form of <foo bar>. In other words, the
: whitespace character following "<+foo" doesn't seem to be
: sufficient to indicate how the remainder is to be processed --
: we have to look beyond the whitespace for a leading plus or minus.
If we stick with +, one approach might be to simply disallow whitespace
in composite character classes.
: Perhaps S05 is addressing this when it says
:
: An initial identifier is taken as a character class, so the
: first character after the identifier doesn't matter in this
: case, and you can use whitespace however you like.
:
: Here I find this wording very unclear -- it doesn't tell me
: what is distinguishing the "doesn't matter in this case" part
: between <+foo + bar> and <+foo bar>.
What, me unclear? How could that happen? :-)
[Don't answer that...]
: Since the S05 spec has changed so that all punctuation is meta,
: I'm thinking we may be able to simplify the spec altogether.
: Previously the "whitespace following the identifier" was
: used to distinguish <foo-bar> from <foo -bar>, or <alpha-[Jj]>
: from <alpha -[Jj]>. Since it's now effectively impossible for
: a regex to begin with a bare plus or minus character, we may be
: able to alter the "whitespace following identifier" wording such
: that <foo-bar> and <foo - bar> are identical. Perhaps
: something like:
:
: - if the character following the identifier is a left paren,
: it's a call
:
: <foo('bar')>
: <+foo('bar')>
: <!foo('bar')>
:
: - if the character following the identifier is a colon, the rest
: of the text (following any whitespace) is passed as a string
:
: <foo: bar> # same as <foo('bar')>
: <+foo: bar>
: <!foo: bar>
:
: - if the identifier is followed by a plus or minus (with optional
: intervening whitespace), it's a set of character classes
:
: <foo+baz-bar>
: <foo + baz - bar> # same thing
: <+foo + baz - bar> # also the same
:
: - anything else following whitespace is a regex to be passed
:
: <foo bar> # same as <foo(/bar/)>
: <+foo bar> # same as <+foo(/bar/)>
: <!foo bar> # same as <!foo(/bar/)>
That's assuming we don't define any metasyntax that starts with + or
- in the future, such as bare +[ a..z ], or +[ ...] as a variant of
[...]+. And while we could resolve the ambiguity of the second +
by fiat, it would probably be better if the ambiguity didn't arise
in the first place. If <+foo ...> is going to change the parsing
of ... at all, then it should probably do so consistenly, which
means <+foo> is really a bad plan. (Also, there are already too
many +'s in patterns.) So while it's cute to generalize <+foo> to
"establish the initial universal set of matches", I suspect it's
likely to change to something else. Possibilities I've been mulling:
<~ws> # "I just want to match as a string"
<\ws> # "Don't do the normal thing with the following"
<.ws> # "Just call the ws method"
<=ws> # "Bind to nothing", assuming <foo=ws> binds $<foo>
Damian points out that it's a little strange for = to enable binding
in the <foo=ws> case but disable it in the <=ws> case. It would be
possible to make <=ws> mean <ws=ws> and <ws> not capture at all.
Offhand I'd say that would be bad huffmanization, but I need to look
at STD some more. It also depends on any post-binding syntax
resembling:
<ws> -> $foo {...}
and whether that is deemed preferable to <foo=ws> or $foo=<ws> or
whatever. (One nice thing about the post syntax is that we could know
for sure that we're creating a new var, not binding an existing one,
so [] -> $x; might in fact declare $x as a "my" variable that happens
to scope properly under backtracking. But I digress.)
Other available chars:
<`ws>
<^ws>
<&ws>
<*ws>
<-ws>
<|ws>
<:ws>
<;ws>
</ws>
Larry