Having just looked at the Python source, the 100 group limit is literally just hard coded into Modules/_sre.c and a few other places. Without looking at it more closely, it is hard to imagine a reason why this is needed. Plus, if you're going to hard-code a limit, surely it could be bumped up to something substantially larger like 1000 or even 10000 on a modern machine (assuming the limit was there for memory).
Wonder if this could be pushed for Python 3.3 (still in beta). Cheers, Dave On Aug 20, 2012, at 10:45 AM, Eugene Voytitsky wrote: > Thanks, Albert. > > It seems that I have really only 2 options: > 0. Live with it, or > 1. Recompile the Python without the checking of that limitation. (And you are > right – I should ask python community. But I decided to share info here.) > > Other options don't suite, because of a lot of work was been done and code > already in production. > > > On 20.08.12 18:18, A.T.Hofkamp wrote: >> On 08/17/2012 02:52 PM, viy wrote: >>> Hi all, jfyi >>> >>> I've added just one token to my lexer rules and stuck in 100 groups >>> limit in >>> python re >>> http://stackoverflow.com/questions/478458/python-regular-expressions-with-more-than-100-groups >>> >>> >>> >>> PLY has workaround in its code - when your master re exceeds 100 >>> groups, PLY >>> catches AssertionError from python, splits master re into parts and >>> retries. >>> >>> All works smoothly, but in my case my unit tests suite became 10x slower. >>> Single parsing is about 1.5x slower. >>> >>> The solution is obvious - to get rid of the python limitation. >>> Does anyone know the best way to do so? >> >> Re-implement RE? :D >> Much happiness would spread throughout the Python community, I am sure :) >> >> >> Other solutions include >> >> 0. Live with it. Other solutions may cost more time than you are ever >> going to save. >> >> 1. DIY: You can easily define your own scanner, using arbitrary Python >> code. Just make sure you match the interface. String scanning is >> relatively easy, it just takes a lot of code. >> >> 2. A long time ago (several years at least), someone wrote a Lex >> framework. I forgot about the details, but the mailinglist archive or >> google can probably help you. Iirc, it was a true lex, and had a >> different approach than using RE. >> >> 3. More exotic solutions like writing a scanner C extension (generated >> with lex/flex) are also possible. >> >> 4. Even more exotic stuff like generating a DFA somehow, and >> implementing that in Python can be done. >> >> 4. Other Python parser generators may have better solutions (I somewhat >> doubt it, but it should be easy enough to scan through it, checking for >> how the scanner works) >> >> >> Good luck >> >> Albert >> > > > -- > Best regards, > Eugene Voytitsky > > -- > You received this message because you are subscribed to the Google Groups > "ply-hack" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]. > For more options, visit https://groups.google.com/groups/opt_out. > > -- You received this message because you are subscribed to the Google Groups "ply-hack" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.
