Ezio Melotti <ezio.melo...@gmail.com> added the comment: This is a proof that you can have an equivalent regex without including all the 'letter chars' (tested on both narrow and wide builds): >>> s = u''.join(unichr(c) for c in range(sys.maxunicode)) >>> diff = set(re.findall(u'[^\W\d]', s, re.U)) ^ set(re.findall(u'[%s_-]' % >>> makew(), s, re.U)) >>> diff.remove('-') >>> re.findall(u'(?:[^\W\d%s]|-)' % ''.join(diff), s, re.U) == >>> re.findall(u'[%s_-]' % makew(), s, re.U) True
(I don't like the way I included the '-' but I couldn't find anything better.) It looks however that most of the time is spent during the findall and from a quick benchmark it seems that my regex is slower (even if it's shorter and it compiles faster). ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue8064> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com