Jeffrey C. Jacobs <timeho...@users.sourceforge.net> added the comment:
Okay, as I said, Atomic Grouping, etc., off a recent 2.6 is already available and I can do any cleanups requested to those already mentioned, I just don't want to start any new items at the moment. As it is, we are still over a year from any of this seeing the light of day as it's not going to be merged until we start 2.7 / 3.1 alpha. Fortunately, I think Matthew here DOES have a lot of potential to have everything wrapped up by then, but I think to summarize everyone's concern, we really would like to be able to examine each change incrementally, rather than as a whole. So, for the purposes of this, I would recommend that you, Matthew, make a version of your new engine WITHOUT any Atomic Group, variable length look behind / ahead assertions, reverse string scanning, positional, negated or scoped inline flags, group key indexing or any other feature described in the various issues, and that we then evaluate purely on the merits of the engine itself whether it is worth moving to that engine, and having made that decision officially move all work to that design if warranted. Personally, I'd like to see that 'pure' engine for myself and maybe we can all develop an appropriate benchmark suite to test it fairly against the existing engine. I also think we should consider things like presentation (are all lines terminated by column 80), number of comments, and general readability. IMHO, the current code is conformant in the line length, but VERY deficient WRT comments and readability, the later of which it sacrifices for speed (as well as being retrofitted for iteration rather than recursion). I'm no fan of switch-case, but I found that by turning the various case statements into bite-sized functions and adding many, MANY comments, the code became MUCH more readable at the minor cost of speed. As I think speed trumps readability (though not blindly), I abandoned my work on the engines, but do feel that if we are going to keep the old engine, I should try and adapt my comments to the old framework to make the current code a bit easier to understand since the framework is more or less the same code as in the existing engine, just re-arranged. I think all of the things you've added to your engine, Matthew, can, with varying levels of difficulty be implemented in the existing Regexp Engine, though I'm not suggesting that we start that effort. Simply, let's evaluate fairly whether your engine is worth the switch over. Personally, I think the engine has some potential -- though not much better than current WRT readability -- but we've only heard anecdotal evidence of it's superior speed. Even if the engine isn't faster, developing speed benchmarks that fairly gage any potential new engine would be handy for the next person to have a great idea for a rewrite, so perhaps while you peruse the stripped down version of your engine, the rest of us can work on modifying regex_tests.py, test_re.py and re_tests.py in Lib/test specifically for the purpose of benchmarking. If we can focus on just these two issues ('pure' engine and fair benchmarks) I think I can devote some time to the later as I've dealt a lot with benchmarking (WRT the compiler-cache) and test cases and hope to be a bit more active here. ---------- message_count: 69.0 -> 70.0 _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue2636> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com