Quoting Dan Sugalski ([EMAIL PROTECTED]):
> At 11:54 AM 11/5/2001 -0800, Steve Fink wrote:
> > > >It's pretty
> > > >much functional, including reOneof.  Still, these could be useful
> > > >internal functions... *ponder*
> > >
> > > I was thinking that the places they could come in really handy for were
> > > character classes. \w, \s, and \d are potentially a lot faster this way,
> > > 'specially if you throw in Unicode support. (The sets get rather a bit
> > > larger...) It also may make some character-set independence easier.
> >
> >But why would you be generating character classes at runtime?
> 
> Because someone does:
> 
>    while (<>) {
>          next unless /[aeiou]/;
>    }
> 
> and we want that character class to be reasonably fast?

? So don't generate it at runtime. When you generate the opcode
sequence for the regex, emit a bit vector into the constant table and
refer to it by address in the matchCharClass op's arguments. Be fancy
and check that you haven't already emitted that bit vector. Am I
missing something?

> >For
> >ASCII or iso-8859 or whatever regular ol' bytes are properly called, I
> >would expect \w \s \d charclasses to be constants. In fact, all
> >character classes would be constants. And as Dax mentioned, the
> >constructors for those constants would properly be internal functions.
> 
> Sure, the predefined ones would be, and they'd get loaded up along with the 
> character encoding libraries.

Ok, so they're even more constant :-), but I'm talking about constants
in the sense that

my $x = 18.34;

emits a constant 18.34 floating point value in the same way that

if (/[aeiou]/) 

would emit a constant vowel charclass?

> >For UTF-32 etc., I don't know. I was thinking we'd have to have
> >something like a multi-level lookup table for character classes. I see
> >a character class as a full-blown ADT with operators for
> >addition/unions, subtraction/intersections, etc.
> 
> Ah, point. A bitmap won't work too well with the full UTF-32 set.
> 
> Having a good set of set operations would be useful for the core, though.

No argument there.

> >You aren't thinking that the regular expression _compiler_ needs to be
> >written in Parrot opcodes, are you? I assumed you'd reach it through
> >some callout mechanism in the same way that eval"" will be handled.
> 
> The core of the parser's still a bit up in the air. Larry's leaning towards 
> it being in perl.

When you say "parser", do you mean parser + bytecode generator +
optimizer + syntax analyzer? (Of which only the bytecode generator is
relevant to [:classes:], I suppose.)

Reply via email to