Gareth Rees added the comment: This is a well-known gotcha with backtracking regexp implementations. The problem is that in the alternation "( +|'[^']*'|\"[^\"]*\"|[^>]+)" there are some characters (space, apostrophe, double quotes) that match multiple alternatives (for example a space matches both " +" and "[^>]+"). This causes the regexp engine to have to backtrack for each ambiguous character to try out the other alternatives, leading to runtime that's exponential in the number of ambiguous characters.
Linear behaviour can be restored if you make the alternation unambiguous, like this: ( +|'[^']*'|\"[^\"]*\"|[^>'\"]+) ---------- nosy: +Gareth.Rees _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue28690> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com