On Mon, Mar 09, 2009 at 10:32:02AM -0700, jerry gay wrote:
> > To make things a bit quicker for people writing custom versions of
> > <ws> (which may need to include "comment whitespace"), the Parrot
> > Compiler Toolkit also provides an optimized <ww> rule that matches
> > only between a pair of word characters. Then the default definition
> > of <ws> becomes
> >
> > token ws { <!ww> \s* }
>
> if you need a mnemonic to help you remember what 'ww' means, use 'within
> word'.
>
> this reminds me that pge's <ww> may be incorrect in its treatment of
> <apostrophe>. these characters (<['-]> by default) are word
> characters, but i don't think that's been tested, and i don't think
> it's been implemented, either.
A couple of clarifications:
- PGE doesn't implement <ww> by default, because that's not (yet?)
part of the spec. It only appears in PCT::Grammar, for people
using the Parrot Compiler Toolkit to create languages.
- AFAICT, apostrophe and hyphen are not yet "word characters" in
the sense of being members of \w . That is, they're considered
to be valid in identifiers, but only when they are immediately
preceded by a word character and immediately followed by an
alphabetic character. Otherwise they're not part of the
identifier. (At least, that's how the current STD.pm reads.)
Pm