Re: perl6-language-regex summary for 20000920

Ilya Zakharevich Wed, 27 Sep 2000 13:04:12 -0700
==================================================================
RFC 72: The regexp engine should go backward as well as
        forward. (Peter Heslin)

Peter says (edited):
:If the regexp code is unlikely to be rewritten from the ground up,
then
:there may be little chance of this feature being implemented. I'll
make
:a pitch for it anyway at the end of my talk at YAPC::Europe, and then
:I'll freeze the RFC.
==================================================================

As I said it for many times: this is absolutely trivial to implement.
First of all, if you agree to rewrite

 (?<= \w\s*\d )         # Semantic X: match "a  1"

as

 (?<= \d\s*\w )         # Semantic Y: match "a  1"

then it is as simple as inserting go-back-by(1) nodes before each node
for \s \d and \w.

And to support the more intuitive ;-) semantic X, the only
more-or-less tricky part is to recursively go through the compile
tree, and put "concatenated" nodes in the opposite order.  A piece of
cake.

==================================================================
RFC 145: Brace-matching for Perl Regular Expressions  (Eric Roode)

The closest we have to an emerging consensus appears to be that
it is very difficult to pin down a precise problem to solve - the
areas in which we want to match pairs of delimiters (such as
numeric expressions, C code, perl code, HTML and XML) each seem
to require a variety of special cases, each different from the
other.
==================================================================

Emacs gives a bare minimum to support: mark chars by syntax classes.
Which classes there are is a tricky question.  Emacs's way is too C-centric. 

==================================================================

I have no time to summarize the things I feel are needed.  But since
they can be easily done in the Perl5 track as well, maybe they are not
proper for this list.  And I discussed all of them many times already...

   "unfinished strings",               (allows $/ = /fo*ba*r/)

   \g< and \g>                         (report start/end of $& at these pos);

   onion rings: (?<> A <> B &! C & D)  (substring matched by A
                                        such that B and D match against
                                        it, but C does not, in B, C, D
                                        \A and \z denote boundaries of
                                        what was matched by A);

   \F{-*}, \F{-.}, \F+  (finish and restart the match "where"), here
   "where" is nowhere/at-the-current-position/as-usual, and -/+ mean
   whether one needs to report this match to the caller;

   applying a REx to a substring (two versions: with/without allowing
     lookahead/behind outside of the range);

   (*@arr:  REx )  # Make @arr the default-match-array instead of ($1,$2,...)
                   # (@arr is not interpolated)

   (*%hash: REx )  # Make @hash the default-match-hash instead of %^MATCH

   (*id:    REx )  # Put what-is-matched into $default_match_hash{id}

   (*id*:   REx )* # As, REx*, but put what-is-matched during each REx
                   # into separate elements of @{$default_match_hash{id}}

   (*id[]:  REx )  # make @{$default_match_hash{id)} into default-match-array

   (*id{}:  REx )  # make %{$default_match_hash{id)} into default-match-hash

                   # all of the above are localized for the duration of REx

as well as many performance improvements.

Yours,
Ilya
Re: perl6-language-regex summary for 20000920

Reply via email to