On Wed, May 11, 2005 at 08:00:20PM -0500, Patrick R. Michaud wrote:
: Somehow I'd like to get rid of those inner angles, so 
: that we always use  <+alpha>, <+digit>, <-sp>, <-punct> to 
: indicate named character classes, and specify combinations 
: with constructions like  <+alpha+punct-[aeiou]>  and  <+word-[_]>.  
: We'd still allow <[abc]> as a shortcut to <+[abc]>.

I like it.

: I haven't thought far ahead to the question of whether
: character classes would continue to occupy the same namespace
: as rules (as they do now) or if they become specialized kinds
: of rules or what.  I'll just leave it at this for now and
: see what the rest of p6l thinks.

Hmm, well, positive matches can be defined to traverse whatever the
longest sequence matched is, even if it's actually multiple characters
by some reckoning or other.  On the other hand, negative matches
can really only skip one character in the current view regardless of
how long the sequences in the class are, which function as a negative
lookahead for the subsequent character skip.  In other words, <-alpha>
really means something like [<!alpha> .]

But then it's not entirely clear how character class set theory works.
Another thing we have to work out.  Obviously + and - are ordered,
and we probably want & and | for actual set operations.  But does
<-[a]> negate only a preceding 'a' or all characters that use 'a'
as the base character along with subsequent combining characters?
We're almost getting into a wildcarding situation there...

In any event, the takehome message here is that characters cannot
be assumed to be constant width any more.

I think this argues that character classes really are rules of a sort.

Larry

Reply via email to