Mike Lambert: (a bunch of stuff about regexes) No offense intended, but I had trouble understanding that, and I helped come up with the thing. :^) So, I'll try to interpret.
In Perl 5, we came up against the problem of simply running out of characters in regexes. To deal with this, Larry came up with the (?_regex) syntax, where _ is some character. Although a clever use of an otherwise impossible sequence, it's also gratuitously ugly. Consider the many roles (?_) plays: Non-capturing parentheses: (?:) Look(ahead|behind)s: (?=), (?!), (?<=), (?<!) Inline code: (?{}), (??{}) Inline modifiers: (?imsx-ismx), (?imsx-ismx) Conditionals: (?()), (?()|) Comments: (?#) Non-backtracking: (?>) Obviously, this is getting out of hand--using more than one or two of those constructs makes your regex much harder to read. Let's first tackle non-capturing parentheses and lookarounds. If we think about what metacharacters are around, we can realize that {} is only legal with numbers inside it. [0] That means that we can probably reuse it. If we think about it, we can derive a few basic categories: -consuming (_) or not (|) [1] Reasoning: _ is fat, | is skinny -positive (=) or negative (!) Reasoning: same as in Perl 5 -forwards (>) or backwards (<) Reasoning: same as in Perl 5 The characters in parentheses are prefix characters that indicate which is to be used. A simple mapping of the five things this section covers follows: Perl 5 Perl 6 ------ ------ (?:regex) {_=>regex} (?=regex) {|=>regex} (?!regex) {|!>regex} (?<=regex) {|=<regex} [2] (?<!regex) {|!<regex} Obviously, that's a bit much to type. But if we define some reasonable defaults, it becomes more manageable. By default, the specifier is _=>. So here's a map of what you're more likely to see in a regex: Perl 5 Perl 6 ------ ------ (?:regex) {regex} (?=regex) {|regex} (?!regex) {|!regex} (?<=regex) {|<regex} (?<!regex) {|!<regex} However, the sharp reader might have noticed that there were three possibilities missing from the above tables. That's right--we get free features too! (_!>regex) -- Nonsensical. {_=<regex) -- Match backwards. [3] {_!<regex) -- Nonsensical. Well, one free feature--we end up with reversed regexes from this deal. The final table looks like this: Perl 5 Perl 6 ------ ------ (?:regex) {regex} N/A {<regex} (?=regex) {|regex} (?!regex) {|!regex} (?<=regex) {|<regex} (?<!regex) {|!<regex} He then went on to describe something I didn't understand at all. Sorry. --- BEGIN MY THOUGHTS --- The only major drawback I can see to that is the naïve user might type {<b>.*?</b>}+ expecting a bunch of text in bold tags and getting a lookbehind instead--so it may be wise to leave the | and _ specifiers out of this altogether, and come up with a better way. I'll address that point shortly. In the mean time, let's consider some of the other syntaxes. The inline code tings are a good opportunity for improvement--and they have a good alternative. In Perl 5, ({ ought not to be legal, but it is--it's hacked in to be the same as (\{. So, we can drop a question mark from each of the block forms, getting ({code}) and (?{code}. However, we can go even further by combining the two. Here's how it works: -If the code returns undef, we backtrack. -If the code returns the empty string, we move on. -If the code returns anything else, we interpolate that into the regex. So, we now just have ({}). Comments can go, since Larry has said that /x will be on by default anyway. That leaves conditionals, non-backtracking sections, inline modifiers, and (maybe) non-capturing parens. We now have three characters that aren't valid in these places: *, +, and ?. My suggestion is this: Thing Syntax Logic ----- ------ ----- Conditionals (?()|) The question mark makes sense for a conditional. Inline Modifiers (?imsx-imsx) Might as well be a little bit compatible. Non-backtracking (+) + requires more than * does. Non-capturing (*) Suggestions welcome. :^) So, my final suggestions are: Perl 5 Perl 6 ------ ------ (?:) (*) (?=) {} (?!) {!} (?<=) {<} (?<!) {<!} [4] (?()) (?()) (?()|) (?()|) (?imsx-imsx) (?imsx-imsx) (?imsx-imsx:) (?imsx-imsx:) (?>) (+) (?{}) ({}) returning empty string (??{}) ({}) returning a string or regex (?#) N/A--obsolete Please feel free to comment on these. [0] Perl won't be the first tool to take advantage of this--lex uses something similar for named subexpressions. [1] Neither of these characters is ideal, however. | looks like !, and _ might reasonably be at the beginning of this sort of thing anyway. Better suggestions are welcome. [2] Mike originally had all the backwards matches as sexegers. I think this is a bad idea, but feel obligated to mention that. [3] This seems a bit useless to me too. It's probably more useful to have a /r modifier on the entire regex. [4] I changed the ordering for this one to avoid an ambiguity. --Brent Dax <[EMAIL PROTECTED]> @roles=map {"Parrot $_"} qw(embedding regexen Configure) #define private public --Spotted in a C++ program just before a #include