- In my mind, there is no question that Rust should have a RE engine. I should hate to see you blocked.

If written in in Rust, macros could come out to play. Please see CL-PPCRE for some ideas on how this has been previously done.

- I would suggest that the "standard" Perl regexes be used. Backreferences can be useful, and I don't think that precluding them from later additions is a good idea, even if not released initially.

- I have not had to do much with Unicode besides hiss rudely at it: doing it well would mean its inclusion should not affect ANSI development as we happily wander down our 7-bit rural ways. ;)

On 5/4/13 6:54 AM, Devin Jeanpierre wrote:
With regards to < https://github.com/mozilla/rust/issues/3591 >, I'd
like to write a regular expression module for Rust. I've written a
couple of regular expression engines in Python for fun in the past[*],
and #rust pressured me to utilize my perverse sense of fun to write
the same for Rust. Actually, the reason I learned Rust was to port a
bunch of my regex code to a nice language. :)

I'm writing this email because
https://github.com/mozilla/rust/wiki/Library-editing told me to. I
don't know much about the process, but as I understand it this marks
the beginning of a one week discussion period where Rust-Dev fleshes
out ideas for such a module, and whether or not it deserves to be
written, and whether or not I should be the one writing it.

I've also added this library page to the wiki:
https://github.com/mozilla/rust/wiki/Lib-re


I've already discussed this somewhat with some people in #rust,
especially Marvin Löbel (kimundi) has been interested in helping me
come up with a nice API. Hopefully we can put that down in writing
here so that it isn't just in our memory.


Some questions to start off with:

- Should rust have a new regex engine written in Rust, or should it
just have bindings for e.g. RE2 or similar?

     A point brought up in #rust: if we use RE2 or similar, we may not
     be able to have a re!() syntax extension that compiles regexps at
     the same time as the surrounding rust code.

     I prefer the former, because I wanted to write a new regex engine
     regardless. I would be perfectly happy to write some nice bindings
     for something like RE2, but I am probably not the best person to do it.

- What syntax/semantics are important?

     I would propose supporting the "usual" PCRE syntax and semantics
     (including submatch extraction), but with the exception of
     backreferences and any other features which cannot be implemented
     efficiently (i.e. polynomial time).

     RE2 has a good summary of regex syntax, although it doesn't
     specify for PCRE-family syntax whether it comes from perl, libpcre,
     python, or something else.

         http://code.google.com/p/re2/wiki/Syntax

     Note: if rust's re module is efficient, syntaxes for things like
     possessive quantifiers is pointless and can be dropped.

     It may be desirable to include alternate parse disambiguation
     strategies. Using "efficient" RE, it's fairly easy to support
     POSIX-style longest match, as well as PCRE-style matches and even
     shortest match. For example, RE2 offers support for PCRE-style and
     also POSIX style regex matching.

- How important is Unicode support and how broad should that support be?

     My understanding is that, at least as long as it can be added
     later, this is not crucial to get right correct right away.

     Unicode TR-18 defines 3 levels of Unicode support in regex
     implementations, of which only the first two are relevant. I think the
     only thing missing from core::unicode to give level 1 support is
     simple case folding.

     * https://github.com/mozilla/rust/issues/5820
     * http://www.unicode.org/reports/tr18/


That's probably enough to start off with, especially since the answer
to question 1 ties our hands on everything afterwards. Also since my
hands hurt from typing. However, there's a lot of other topics, like
what the API should look like, whether or not to support various
syntaxes, etc.. I've added a lot of links and a few additional topics
to the library proposal page.

     link again: https://github.com/mozilla/rust/wiki/Lib-re

Let me know if there's something I've left out of here or of the
library proposal page. When there's more discussion / I have more
energy, I will suggest some of my personal ideas of what I'd like in a
regex module, but somehow I don't feel that's appropriate at the top
level post.

-- Devin Jeanpierre

.. [*] Here's the work I did before

     https://bitbucket.org/devin.jeanpierre/re0/
         an attempt at getting "everything", it failed at the end since
         I couldn't do assertions in O(1) space)

     https://bitbucket.org/devin.jeanpierre/replay/
         "CS" style regexps _without_ submatch extraction; this was me
         exploring lots of implementation strategies to get ideas for solving
         the above problem. Still not complete.
_______________________________________________
Rust-dev mailing list
[email protected]
https://mail.mozilla.org/listinfo/rust-dev



--
Regards,
Paul

_______________________________________________
Rust-dev mailing list
[email protected]
https://mail.mozilla.org/listinfo/rust-dev

Reply via email to