With regards to < https://github.com/mozilla/rust/issues/3591 >, I'd like to write a regular expression module for Rust. I've written a couple of regular expression engines in Python for fun in the past[*], and #rust pressured me to utilize my perverse sense of fun to write the same for Rust. Actually, the reason I learned Rust was to port a bunch of my regex code to a nice language. :)
I'm writing this email because https://github.com/mozilla/rust/wiki/Library-editing told me to. I don't know much about the process, but as I understand it this marks the beginning of a one week discussion period where Rust-Dev fleshes out ideas for such a module, and whether or not it deserves to be written, and whether or not I should be the one writing it. I've also added this library page to the wiki: https://github.com/mozilla/rust/wiki/Lib-re I've already discussed this somewhat with some people in #rust, especially Marvin Löbel (kimundi) has been interested in helping me come up with a nice API. Hopefully we can put that down in writing here so that it isn't just in our memory. Some questions to start off with: - Should rust have a new regex engine written in Rust, or should it just have bindings for e.g. RE2 or similar? A point brought up in #rust: if we use RE2 or similar, we may not be able to have a re!() syntax extension that compiles regexps at the same time as the surrounding rust code. I prefer the former, because I wanted to write a new regex engine regardless. I would be perfectly happy to write some nice bindings for something like RE2, but I am probably not the best person to do it. - What syntax/semantics are important? I would propose supporting the "usual" PCRE syntax and semantics (including submatch extraction), but with the exception of backreferences and any other features which cannot be implemented efficiently (i.e. polynomial time). RE2 has a good summary of regex syntax, although it doesn't specify for PCRE-family syntax whether it comes from perl, libpcre, python, or something else. http://code.google.com/p/re2/wiki/Syntax Note: if rust's re module is efficient, syntaxes for things like possessive quantifiers is pointless and can be dropped. It may be desirable to include alternate parse disambiguation strategies. Using "efficient" RE, it's fairly easy to support POSIX-style longest match, as well as PCRE-style matches and even shortest match. For example, RE2 offers support for PCRE-style and also POSIX style regex matching. - How important is Unicode support and how broad should that support be? My understanding is that, at least as long as it can be added later, this is not crucial to get right correct right away. Unicode TR-18 defines 3 levels of Unicode support in regex implementations, of which only the first two are relevant. I think the only thing missing from core::unicode to give level 1 support is simple case folding. * https://github.com/mozilla/rust/issues/5820 * http://www.unicode.org/reports/tr18/ That's probably enough to start off with, especially since the answer to question 1 ties our hands on everything afterwards. Also since my hands hurt from typing. However, there's a lot of other topics, like what the API should look like, whether or not to support various syntaxes, etc.. I've added a lot of links and a few additional topics to the library proposal page. link again: https://github.com/mozilla/rust/wiki/Lib-re Let me know if there's something I've left out of here or of the library proposal page. When there's more discussion / I have more energy, I will suggest some of my personal ideas of what I'd like in a regex module, but somehow I don't feel that's appropriate at the top level post. -- Devin Jeanpierre .. [*] Here's the work I did before https://bitbucket.org/devin.jeanpierre/re0/ an attempt at getting "everything", it failed at the end since I couldn't do assertions in O(1) space) https://bitbucket.org/devin.jeanpierre/replay/ "CS" style regexps _without_ submatch extraction; this was me exploring lots of implementation strategies to get ideas for solving the above problem. Still not complete. _______________________________________________ Rust-dev mailing list [email protected] https://mail.mozilla.org/listinfo/rust-dev
