On 08/17/2012 02:52 PM, viy wrote:
Hi all, jfyi

I've added just one token to my lexer rules and stuck in 100 groups limit in
python re
http://stackoverflow.com/questions/478458/python-regular-expressions-with-more-than-100-groups


PLY has workaround in its code - when your master re exceeds 100 groups, PLY
catches AssertionError from python, splits master re into parts and retries.

All works smoothly, but in my case my unit tests suite became 10x slower.
Single parsing is about 1.5x slower.

The solution is obvious - to get rid of the python limitation.
Does anyone know the best way to do so?

Re-implement RE? :D
Much happiness would spread throughout the Python community, I am sure :)


Other solutions include

0. Live with it. Other solutions may cost more time than you are ever going to 
save.

1. DIY: You can easily define your own scanner, using arbitrary Python code. Just make sure you match the interface. String scanning is relatively easy, it just takes a lot of code.

2. A long time ago (several years at least), someone wrote a Lex framework. I forgot about the details, but the mailinglist archive or google can probably help you. Iirc, it was a true lex, and had a different approach than using RE.

3. More exotic solutions like writing a scanner C extension (generated with 
lex/flex) are also possible.

4. Even more exotic stuff like generating a DFA somehow, and implementing that 
in Python can be done.

4. Other Python parser generators may have better solutions (I somewhat doubt it, but it should be easy enough to scan through it, checking for how the scanner works)


Good luck

Albert

--
You received this message because you are subscribed to the Google Groups 
"ply-hack" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to