> He then went on to describe something I didn't understand at all. > Sorry.
Few corrections to what you wrote: To avoid the problem of extending {} to support new features with a character 'x', without breaking stuff that might have an 'x' immediately after the '{', my proposal is to require one space after the { before the real regex appears. So to correct the example I wrote of /{a|b|c}+/, it would become /{ a|b|c}+/. It looks a bit weird if you're accustomed to perl5's behavior of (?:). { \ } would then match a single space. { } would do nothing, since the second space falls under the whitespace-insensitive regex rule. Now, since we require a space, all the characters before this space now become 'special' in some form. This fact allows us to add new special characters and map them to functionality, if perl doesn't already do that. For example, I would register | to be: sub zerowidth ($regex) { return <<"EOF"; push \@pos, pos(); regex_run $( qr/$regex/ ); pos() = pop \@pos; EOF } And conversely, _ would be written as: sub regularwidth ($r) { return "regex_run $( qr/$r/ )" } This would allow me to do whacky things, like register these: sub plus ($r) {return "\$level++;regex_run $( qr/$r/ )"} sub minus($r) {return "\$level--;check();regex_run $( qr/$r/ )"} sub check {assert($level>0)} { {+ \(} | {- \)} | . } ({ check() }) brent and I also disagreed on the use of sexegers. japhy has done more thinking about this than either of us have, so perhaps we should just let him weigh in on the issue. I proposed that {< be a sexeger, whereas he prefers {< be a lookbehind. I'll use the former for the rest of this discussion, since on IRC we hd to agree to disagree on it. Regardless, having support for sexegers supports all of the behavior of lookbehinds, since lookbehinds are just a constant-string, and could never be a regex in Perl5. I still like the way lookbehinds work, and am not suggesting that they disappear entirely, but rather that they be changed into an underlying sexeger form. sub b ($reg) { my $ger = reverse $reg; return "run_regex qr/{<|= \Q$ger\E}/" } The following perl5 regex: /(?<=foo)bar/ is now equivalent to: /(b foo)bar/ > The only major drawback I can see to that is the naïve user might type > {<b>.*?</b>}+ expecting a bunch of text in bold tags and getting a Sorry I forgot to make that clearer. The above regex would have to be written as { <b>.*</b>}+ to work properly, specifiying that there are no special tokens. > Here's how it works: > -If the code returns undef, we backtrack. > -If the code returns the empty string, we move on. > -If the code returns anything else, we interpolate that into the > regex. > > So, we now just have ({}). ({print "hello"}) will unfortunately, be really weird. Since it returns 1, the block will return 1. We'd have to force-specify a return value of "". While simplifying the set of operators is good, and I want do a bunch of that myself, we should probably offer a way to perform 'execute with no interpolated regex' behavior of before, somehow built up on top of the existing ({}) operator. Reflecting on it all a bit, if we're willing to make a larger sacrifice in backwards compatibility, it might make things make more sense. - {} would be the code operator, which was specified up above as ({}). This makes more sense, imo, since {} is traditionally used for blocks. - () would have all the special semantics described for {} in this thread. The default for () could still be capturing, so ( a*) performs capturing on /a*/. We'd then have to define another pair of symbols for turning capturing on and off. All instances of Perl5's (blah) would convert to ( blah), and all instances of the special operators in perl5 a la (?@#blah) would translate as they did before, but also specifying the 'dont capture within these parens' special identifier. Basically, I'm trying to propose a system which makes all the regex stuff become orthogonal. Rather than creating a bunch of hardcoded types of (?>= regex operators, instead define small functionalities which can be combined in ways to emulate these tried and true constructs. Brent, let me know if I'm still spouting gibberish on this email. :) Mike Lambert