Matthew Barnett added the comment: I found another bug while looking through the source.
On line 495 in function SRE_COUNT: if (maxcount < end - ptr && maxcount != 65535) end = ptr + maxcount*state->charsize; where 'end' and 'ptr' are of type 'char*'. That means that 'end - ptr' is the length in _bytes_, not characters. If the byte after the end of the string is 0 then you get this: >>> # Good: >>> re.search(r"\x00{1,3}", "a\x00\x00").span() (1, 3) >>> # Bad: >>> re.search(r"\x00{1,3}", "\u0100\x00\x00").span() (1, 4) I'll keep looking before submitting a patch. ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue16688> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com