Devin,

I personally would like to see a Rust regex engine.  It could also serve as
a new reference standard going forward with the language design where
additional tests could be bound to it.  A win - win in my opinion for
proving the language design in the first place.

Java, Python, and Ruby, to name a few...share very similar syntax styles or
flavors.
You can scroll down this page to see the subtle differences between all the
flavors, including those 3.
http://www.regular-expressions.info/refflavors.html

I am sure you have intentions to develop the Rust regex engine to have a
very similar flavor, so I am not worried personally.  Just cover the basics
with that aligned Rust regex flavor and most folks will be quite happy,
everything else can be easily looked up in a reference, as is usually the
case for most developers, unless you live and breath regex everyday.

Unicode Level 1 and with most of Level 2 would be a requirement in my
opinion.  It is the world we live in today, thankfully, at last. :)




On Sat, May 4, 2013 at 8:54 AM, Devin Jeanpierre <[email protected]>wrote:

> With regards to < https://github.com/mozilla/rust/issues/3591 >, I'd
> like to write a regular expression module for Rust. I've written a
> couple of regular expression engines in Python for fun in the past[*],
> and #rust pressured me to utilize my perverse sense of fun to write
> the same for Rust. Actually, the reason I learned Rust was to port a
> bunch of my regex code to a nice language. :)
>
> I'm writing this email because
> https://github.com/mozilla/rust/wiki/Library-editing told me to. I
> don't know much about the process, but as I understand it this marks
> the beginning of a one week discussion period where Rust-Dev fleshes
> out ideas for such a module, and whether or not it deserves to be
> written, and whether or not I should be the one writing it.
>
> I've also added this library page to the wiki:
> https://github.com/mozilla/rust/wiki/Lib-re
>
>
> I've already discussed this somewhat with some people in #rust,
> especially Marvin Löbel (kimundi) has been interested in helping me
> come up with a nice API. Hopefully we can put that down in writing
> here so that it isn't just in our memory.
>
>
> Some questions to start off with:
>
> - Should rust have a new regex engine written in Rust, or should it
> just have bindings for e.g. RE2 or similar?
>
>     A point brought up in #rust: if we use RE2 or similar, we may not
>     be able to have a re!() syntax extension that compiles regexps at
>     the same time as the surrounding rust code.
>
>     I prefer the former, because I wanted to write a new regex engine
>     regardless. I would be perfectly happy to write some nice bindings
>     for something like RE2, but I am probably not the best person to do it.
>
> - What syntax/semantics are important?
>
>     I would propose supporting the "usual" PCRE syntax and semantics
>     (including submatch extraction), but with the exception of
>     backreferences and any other features which cannot be implemented
>     efficiently (i.e. polynomial time).
>
>     RE2 has a good summary of regex syntax, although it doesn't
>     specify for PCRE-family syntax whether it comes from perl, libpcre,
>     python, or something else.
>
>         http://code.google.com/p/re2/wiki/Syntax
>
>     Note: if rust's re module is efficient, syntaxes for things like
>     possessive quantifiers is pointless and can be dropped.
>
>     It may be desirable to include alternate parse disambiguation
>     strategies. Using "efficient" RE, it's fairly easy to support
>     POSIX-style longest match, as well as PCRE-style matches and even
>     shortest match. For example, RE2 offers support for PCRE-style and
>     also POSIX style regex matching.
>
> - How important is Unicode support and how broad should that support be?
>
>     My understanding is that, at least as long as it can be added
>     later, this is not crucial to get right correct right away.
>
>     Unicode TR-18 defines 3 levels of Unicode support in regex
>     implementations, of which only the first two are relevant. I think the
>     only thing missing from core::unicode to give level 1 support is
>     simple case folding.
>
>     * https://github.com/mozilla/rust/issues/5820
>     * http://www.unicode.org/reports/tr18/
>
>
> That's probably enough to start off with, especially since the answer
> to question 1 ties our hands on everything afterwards. Also since my
> hands hurt from typing. However, there's a lot of other topics, like
> what the API should look like, whether or not to support various
> syntaxes, etc.. I've added a lot of links and a few additional topics
> to the library proposal page.
>
>     link again: https://github.com/mozilla/rust/wiki/Lib-re
>
> Let me know if there's something I've left out of here or of the
> library proposal page. When there's more discussion / I have more
> energy, I will suggest some of my personal ideas of what I'd like in a
> regex module, but somehow I don't feel that's appropriate at the top
> level post.
>
> -- Devin Jeanpierre
>
> .. [*] Here's the work I did before
>
>     https://bitbucket.org/devin.jeanpierre/re0/
>         an attempt at getting "everything", it failed at the end since
>         I couldn't do assertions in O(1) space)
>
>     https://bitbucket.org/devin.jeanpierre/replay/
>         "CS" style regexps _without_ submatch extraction; this was me
>         exploring lots of implementation strategies to get ideas for
> solving
>         the above problem. Still not complete.
> _______________________________________________
> Rust-dev mailing list
> [email protected]
> https://mail.mozilla.org/listinfo/rust-dev
>



-- 
-Thad
http://www.freebase.com/view/en/thad_guidry
_______________________________________________
Rust-dev mailing list
[email protected]
https://mail.mozilla.org/listinfo/rust-dev

Reply via email to