On 2017-12-26 07:01, Yogev Hendel wrote:
I don't know if this is the right place to put this,
but I've found the following lines of code results in an incredibly long
processing time.
Perhaps it might be of use to someone.
/import re/
/pat = re.compile('^/?(?:\\w+)/(?:[%\\w-]+/?)+/?$')/
/pat.match('/t/a-aa-aa-aaaaa-aa-aa-aa-aa-aa-aa./')/
The pattern has a repeated repeat, which results in catastrophic
backtracking.
As an example, think about how the pattern (?:a+)+b would try to match
the string 'aaac'.
Match 'aaa', but not 'c'.
Match 'aa' and 'a', but not 'c'.
Match 'a' and 'aa', but not 'c'.
Match 'a' and 'a' and 'a', but not 'c'.
That's 4 failed attempts.
Now try match the string 'aaaac'.
Match 'aaaa', but not 'c'.
Match 'aaa' and 'a', but not 'c'.
Match 'aa' and 'aa', but not 'c'.
Match 'aa' and 'a a', but not 'c'.
Match 'a' and 'aaa', but not 'c'.
Match 'a' and 'aa' and 'a', but not 'c'.
Match 'a' and 'a aa', but not 'c'.
Match 'a' and 'a a' and 'a', but not 'c'.
That's 8 failed attempts.
Each additional 'a' in the string to match will double the number of
attempts.
Your pattern has (?:[%\w-]+/?)+, and the '/' is optional. The string has
a '.', which the pattern can't match, but it'll keep trying until it
finally fails.
If you add just 1 more 'a' or '-' to the string, it'll take twice as
long as it does now.
You need to think more carefully about how the pattern matches and what
it'll do when it doesn't match.
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com