> He then went on to describe something I didn't understand at all.
> Sorry.

Few corrections to what you wrote:

To avoid the problem of extending {} to support new features with a
character 'x', without breaking stuff that might have an 'x' immediately
after the '{', my proposal is to require one space after the { before the
real regex appears.

So to correct the example I wrote of /{a|b|c}+/, it would become
/{ a|b|c}+/. It looks a bit weird if you're accustomed to perl5's behavior
of (?:). { \ } would then match a single space. {  } would do nothing,
since the second space falls under the whitespace-insensitive regex rule.

Now, since we require a space, all the characters before this space
now become 'special' in some form. This fact allows us to add new
special characters and map them to functionality, if perl doesn't
already do that.

For example, I would register | to be:
sub zerowidth ($regex) {
  return <<"EOF";
  push \@pos, pos();
  regex_run $( qr/$regex/ );
  pos() = pop \@pos;
  EOF
}

And conversely, _ would be written as:
sub regularwidth ($r) {
  return "regex_run $( qr/$r/ )"
}


This would allow me to do whacky things, like register these:
sub plus ($r) {return "\$level++;regex_run $( qr/$r/ )"}
sub minus($r) {return "\$level--;check();regex_run $( qr/$r/ )"}
sub check     {assert($level>0)}
{ {+ \(} | {- \)} | . } ({ check() })


brent and I also disagreed on the use of sexegers. japhy has done more
thinking about this than either of us have, so perhaps we should just let
him weigh in on the issue. I proposed that {< be a sexeger, whereas he
prefers {< be a lookbehind. I'll use the former for the rest of this
discussion, since on IRC we hd to agree to disagree on it.

Regardless, having support for sexegers supports all of the behavior of
lookbehinds, since lookbehinds are just a constant-string, and could never
be a regex in Perl5. I still like the way lookbehinds work, and am not
suggesting that they disappear entirely, but rather that they be changed
into an underlying sexeger form.

sub b ($reg) {
  my $ger = reverse $reg;
  return "run_regex qr/{<|= \Q$ger\E}/"
}

The following perl5 regex:
/(?<=foo)bar/
is now equivalent to:
/(b foo)bar/


> The only major drawback I can see to that is the naïve user might type
> {<b>.*?</b>}+ expecting a bunch of text in bold tags and getting a

Sorry I forgot to make that clearer. The above regex would have to be
written as { <b>.*</b>}+ to work properly, specifiying that there are no
special tokens.

> Here's how it works:
>       -If the code returns undef, we backtrack.
>       -If the code returns the empty string, we move on.
>       -If the code returns anything else, we interpolate that into the
> regex.
>
> So, we now just have ({}).

({print "hello"}) will unfortunately, be really weird. Since it returns 1,
the block will return 1. We'd have to force-specify a return value of "".
While simplifying the set of operators is good, and I want do a bunch of
that myself, we should probably offer a way to perform 'execute with
no interpolated regex' behavior of before, somehow built up on top of
the existing ({}) operator.



Reflecting on it all a bit, if we're willing to make a larger sacrifice
in backwards compatibility, it might make things make more sense.
- {} would be the code operator, which was specified up above as ({}).
  This makes more sense, imo, since {} is traditionally used for
  blocks.
- () would have all the special semantics described for {} in this
  thread.

The default for () could still be capturing, so ( a*) performs capturing
on /a*/. We'd then have to define another pair of symbols for turning
capturing on and off. All instances of Perl5's (blah)  would convert to
( blah), and all instances of the special operators in perl5 a la
(?@#blah) would translate as they did before, but also specifying the
'dont capture within these parens' special identifier.

Basically, I'm trying to propose a system which makes all the regex stuff
become orthogonal. Rather than creating a bunch of hardcoded types of (?>=
regex operators, instead define small functionalities which can be
combined in ways to emulate these tried and true constructs.

Brent, let me know if I'm still spouting gibberish on this email. :)

Mike Lambert


Reply via email to