[issue8064] Large regex handling very slow on Linux

Ezio Melotti Thu, 04 Mar 2010 18:16:29 -0800

Ezio Melotti <[email protected]> added the comment:

This is a proof that you can have an equivalent regex without including all the 
'letter chars' (tested on both narrow and wide builds):
>>> s = u''.join(unichr(c) for c in range(sys.maxunicode))
>>> diff = set(re.findall(u'[^\W\d]', s, re.U)) ^ set(re.findall(u'[%s_-]' % 
>>> makew(), s, re.U))
>>> diff.remove('-')
>>> re.findall(u'(?:[^\W\d%s]|-)' % ''.join(diff), s, re.U) == 
>>> re.findall(u'[%s_-]' % makew(), s, re.U)
True


(I don't like the way I included the '-' but I couldn't find anything better.)
It looks however that most of the time is spent during the findall and from a 
quick benchmark it seems that my regex is slower (even if it's shorter and it 
compiles faster).

----------

_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue8064>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue8064] Large regex handling very slow on Linux

Reply via email to