Devin, I personally would like to see a Rust regex engine. It could also serve as a new reference standard going forward with the language design where additional tests could be bound to it. A win - win in my opinion for proving the language design in the first place.
Java, Python, and Ruby, to name a few...share very similar syntax styles or flavors. You can scroll down this page to see the subtle differences between all the flavors, including those 3. http://www.regular-expressions.info/refflavors.html I am sure you have intentions to develop the Rust regex engine to have a very similar flavor, so I am not worried personally. Just cover the basics with that aligned Rust regex flavor and most folks will be quite happy, everything else can be easily looked up in a reference, as is usually the case for most developers, unless you live and breath regex everyday. Unicode Level 1 and with most of Level 2 would be a requirement in my opinion. It is the world we live in today, thankfully, at last. :) On Sat, May 4, 2013 at 8:54 AM, Devin Jeanpierre <[email protected]>wrote: > With regards to < https://github.com/mozilla/rust/issues/3591 >, I'd > like to write a regular expression module for Rust. I've written a > couple of regular expression engines in Python for fun in the past[*], > and #rust pressured me to utilize my perverse sense of fun to write > the same for Rust. Actually, the reason I learned Rust was to port a > bunch of my regex code to a nice language. :) > > I'm writing this email because > https://github.com/mozilla/rust/wiki/Library-editing told me to. I > don't know much about the process, but as I understand it this marks > the beginning of a one week discussion period where Rust-Dev fleshes > out ideas for such a module, and whether or not it deserves to be > written, and whether or not I should be the one writing it. > > I've also added this library page to the wiki: > https://github.com/mozilla/rust/wiki/Lib-re > > > I've already discussed this somewhat with some people in #rust, > especially Marvin Löbel (kimundi) has been interested in helping me > come up with a nice API. Hopefully we can put that down in writing > here so that it isn't just in our memory. > > > Some questions to start off with: > > - Should rust have a new regex engine written in Rust, or should it > just have bindings for e.g. RE2 or similar? > > A point brought up in #rust: if we use RE2 or similar, we may not > be able to have a re!() syntax extension that compiles regexps at > the same time as the surrounding rust code. > > I prefer the former, because I wanted to write a new regex engine > regardless. I would be perfectly happy to write some nice bindings > for something like RE2, but I am probably not the best person to do it. > > - What syntax/semantics are important? > > I would propose supporting the "usual" PCRE syntax and semantics > (including submatch extraction), but with the exception of > backreferences and any other features which cannot be implemented > efficiently (i.e. polynomial time). > > RE2 has a good summary of regex syntax, although it doesn't > specify for PCRE-family syntax whether it comes from perl, libpcre, > python, or something else. > > http://code.google.com/p/re2/wiki/Syntax > > Note: if rust's re module is efficient, syntaxes for things like > possessive quantifiers is pointless and can be dropped. > > It may be desirable to include alternate parse disambiguation > strategies. Using "efficient" RE, it's fairly easy to support > POSIX-style longest match, as well as PCRE-style matches and even > shortest match. For example, RE2 offers support for PCRE-style and > also POSIX style regex matching. > > - How important is Unicode support and how broad should that support be? > > My understanding is that, at least as long as it can be added > later, this is not crucial to get right correct right away. > > Unicode TR-18 defines 3 levels of Unicode support in regex > implementations, of which only the first two are relevant. I think the > only thing missing from core::unicode to give level 1 support is > simple case folding. > > * https://github.com/mozilla/rust/issues/5820 > * http://www.unicode.org/reports/tr18/ > > > That's probably enough to start off with, especially since the answer > to question 1 ties our hands on everything afterwards. Also since my > hands hurt from typing. However, there's a lot of other topics, like > what the API should look like, whether or not to support various > syntaxes, etc.. I've added a lot of links and a few additional topics > to the library proposal page. > > link again: https://github.com/mozilla/rust/wiki/Lib-re > > Let me know if there's something I've left out of here or of the > library proposal page. When there's more discussion / I have more > energy, I will suggest some of my personal ideas of what I'd like in a > regex module, but somehow I don't feel that's appropriate at the top > level post. > > -- Devin Jeanpierre > > .. [*] Here's the work I did before > > https://bitbucket.org/devin.jeanpierre/re0/ > an attempt at getting "everything", it failed at the end since > I couldn't do assertions in O(1) space) > > https://bitbucket.org/devin.jeanpierre/replay/ > "CS" style regexps _without_ submatch extraction; this was me > exploring lots of implementation strategies to get ideas for > solving > the above problem. Still not complete. > _______________________________________________ > Rust-dev mailing list > [email protected] > https://mail.mozilla.org/listinfo/rust-dev > -- -Thad http://www.freebase.com/view/en/thad_guidry
_______________________________________________ Rust-dev mailing list [email protected] https://mail.mozilla.org/listinfo/rust-dev
