Quoting Dan Sugalski ([EMAIL PROTECTED]): > At 11:54 AM 11/5/2001 -0800, Steve Fink wrote: > > > >It's pretty > > > >much functional, including reOneof. Still, these could be useful > > > >internal functions... *ponder* > > > > > > I was thinking that the places they could come in really handy for were > > > character classes. \w, \s, and \d are potentially a lot faster this way, > > > 'specially if you throw in Unicode support. (The sets get rather a bit > > > larger...) It also may make some character-set independence easier. > > > >But why would you be generating character classes at runtime? > > Because someone does: > > while (<>) { > next unless /[aeiou]/; > } > > and we want that character class to be reasonably fast?
? So don't generate it at runtime. When you generate the opcode sequence for the regex, emit a bit vector into the constant table and refer to it by address in the matchCharClass op's arguments. Be fancy and check that you haven't already emitted that bit vector. Am I missing something? > >For > >ASCII or iso-8859 or whatever regular ol' bytes are properly called, I > >would expect \w \s \d charclasses to be constants. In fact, all > >character classes would be constants. And as Dax mentioned, the > >constructors for those constants would properly be internal functions. > > Sure, the predefined ones would be, and they'd get loaded up along with the > character encoding libraries. Ok, so they're even more constant :-), but I'm talking about constants in the sense that my $x = 18.34; emits a constant 18.34 floating point value in the same way that if (/[aeiou]/) would emit a constant vowel charclass? > >For UTF-32 etc., I don't know. I was thinking we'd have to have > >something like a multi-level lookup table for character classes. I see > >a character class as a full-blown ADT with operators for > >addition/unions, subtraction/intersections, etc. > > Ah, point. A bitmap won't work too well with the full UTF-32 set. > > Having a good set of set operations would be useful for the core, though. No argument there. > >You aren't thinking that the regular expression _compiler_ needs to be > >written in Parrot opcodes, are you? I assumed you'd reach it through > >some callout mechanism in the same way that eval"" will be handled. > > The core of the parser's still a bit up in the air. Larry's leaning towards > it being in perl. When you say "parser", do you mean parser + bytecode generator + optimizer + syntax analyzer? (Of which only the bytecode generator is relevant to [:classes:], I suppose.)