Re: 100 groups in python re limitation slows down PLY performance

David Beazley Mon, 20 Aug 2012 09:06:26 -0700

Having just looked at the Python source, the 100 group limit is literally just 
hard coded into Modules/_sre.c and a few other places.  Without looking at it 
more closely, it is hard to imagine a reason why this is needed.  Plus, if 
you're going to hard-code a limit, surely it could be bumped up to something 
substantially larger like 1000 or even 10000 on a modern machine (assuming the 
limit was there for memory).


Wonder if this could be pushed for Python 3.3 (still in beta).

Cheers,
Dave

On Aug 20, 2012, at 10:45 AM, Eugene Voytitsky wrote:

> Thanks, Albert.
> 
> It seems that I have really only 2 options:
> 0. Live with it, or
> 1. Recompile the Python without the checking of that limitation. (And you are 
> right – I should ask python community. But I decided to share info here.)
> 
> Other options don't suite, because of a lot of work was been done and code 
> already in production.
> 
> 
> On 20.08.12 18:18, A.T.Hofkamp wrote:
>> On 08/17/2012 02:52 PM, viy wrote:
>>> Hi all, jfyi
>>> 
>>> I've added just one token to my lexer rules and stuck in 100 groups
>>> limit in
>>> python re
>>> http://stackoverflow.com/questions/478458/python-regular-expressions-with-more-than-100-groups
>>> 
>>> 
>>> 
>>> PLY has workaround in its code - when your master re exceeds 100
>>> groups, PLY
>>> catches AssertionError from python, splits master re into parts and
>>> retries.
>>> 
>>> All works smoothly, but in my case my unit tests suite became 10x slower.
>>> Single parsing is about 1.5x slower.
>>> 
>>> The solution is obvious - to get rid of the python limitation.
>>> Does anyone know the best way to do so?
>> 
>> Re-implement RE? :D
>> Much happiness would spread throughout the Python community, I am sure :)
>> 
>> 
>> Other solutions include
>> 
>> 0. Live with it. Other solutions may cost more time than you are ever
>> going to save.
>> 
>> 1. DIY: You can easily define your own scanner, using arbitrary Python
>> code. Just make sure you match the interface. String scanning is
>> relatively easy, it just takes a lot of code.
>> 
>> 2. A long time ago (several years at least), someone wrote a Lex
>> framework. I forgot about the details, but the mailinglist archive or
>> google can probably help you. Iirc, it was a true lex, and had a
>> different approach than using RE.
>> 
>> 3. More exotic solutions like writing a scanner C extension (generated
>> with lex/flex) are also possible.
>> 
>> 4. Even more exotic stuff like generating a DFA somehow, and
>> implementing that in Python can be done.
>> 
>> 4. Other Python parser generators may have better solutions (I somewhat
>> doubt it, but it should be easy enough to scan through it, checking for
>> how the scanner works)
>> 
>> 
>> Good luck
>> 
>> Albert
>> 
> 
> 
> -- 
> Best regards,
> Eugene Voytitsky
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "ply-hack" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to 
> [email protected].
> For more options, visit https://groups.google.com/groups/opt_out.
> 
> 

-- 
You received this message because you are subscribed to the Google Groups 
"ply-hack" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Re: 100 groups in python re limitation slows down PLY performance

Reply via email to