==================================================================
RFC 72: The regexp engine should go backward as well as
forward. (Peter Heslin)
Peter says (edited):
:If the regexp code is unlikely to be rewritten from the ground up,
then
:there may be little chance of this feature being implemented. I'll
make
:a pitch for it anyway at the end of my talk at YAPC::Europe, and then
:I'll freeze the RFC.
==================================================================
As I said it for many times: this is absolutely trivial to implement.
First of all, if you agree to rewrite
(?<= \w\s*\d ) # Semantic X: match "a 1"
as
(?<= \d\s*\w ) # Semantic Y: match "a 1"
then it is as simple as inserting go-back-by(1) nodes before each node
for \s \d and \w.
And to support the more intuitive ;-) semantic X, the only
more-or-less tricky part is to recursively go through the compile
tree, and put "concatenated" nodes in the opposite order. A piece of
cake.
==================================================================
RFC 145: Brace-matching for Perl Regular Expressions (Eric Roode)
The closest we have to an emerging consensus appears to be that
it is very difficult to pin down a precise problem to solve - the
areas in which we want to match pairs of delimiters (such as
numeric expressions, C code, perl code, HTML and XML) each seem
to require a variety of special cases, each different from the
other.
==================================================================
Emacs gives a bare minimum to support: mark chars by syntax classes.
Which classes there are is a tricky question. Emacs's way is too C-centric.
==================================================================
I have no time to summarize the things I feel are needed. But since
they can be easily done in the Perl5 track as well, maybe they are not
proper for this list. And I discussed all of them many times already...
"unfinished strings", (allows $/ = /fo*ba*r/)
\g< and \g> (report start/end of $& at these pos);
onion rings: (?<> A <> B &! C & D) (substring matched by A
such that B and D match against
it, but C does not, in B, C, D
\A and \z denote boundaries of
what was matched by A);
\F{-*}, \F{-.}, \F+ (finish and restart the match "where"), here
"where" is nowhere/at-the-current-position/as-usual, and -/+ mean
whether one needs to report this match to the caller;
applying a REx to a substring (two versions: with/without allowing
lookahead/behind outside of the range);
(*@arr: REx ) # Make @arr the default-match-array instead of ($1,$2,...)
# (@arr is not interpolated)
(*%hash: REx ) # Make @hash the default-match-hash instead of %^MATCH
(*id: REx ) # Put what-is-matched into $default_match_hash{id}
(*id*: REx )* # As, REx*, but put what-is-matched during each REx
# into separate elements of @{$default_match_hash{id}}
(*id[]: REx ) # make @{$default_match_hash{id)} into default-match-array
(*id{}: REx ) # make %{$default_match_hash{id)} into default-match-hash
# all of the above are localized for the duration of REx
as well as many performance improvements.
Yours,
Ilya