Tom Lynn <tl...@users.sourceforge.net> added the comment: The only difference between the two regexps is that the email/header.py version looks for::
(?=[ \t]|$) # whitespace or the end of the string at the end (with re.MULTILINE, so $ also matches '\n'). To expand on "There is nothing about that thing in RFC 2047", it says:: IMPORTANT: 'encoded-word's are designed to be recognized as 'atom's by an RFC 822 parser. RFC 822 says:: atom = 1*<any CHAR except specials, SPACE and CTLs> ... specials = "(" / ")" / "<" / ">" / "@" ; Must be in quoted- / "," / ";" / ":" / "\" / <"> ; string, to use / "." / "[" / "]" ; within a word. So an example of mis-parsing is:: >>> import email.header >>> h = '=?utf-8?q?=E2=98=BA?=(unicode white smiling face)' >>> email.header.decode_header(h) [('=?utf-8?q?=E2=98=BA?=(unicode white smiling face)', None)] The correct result would be:: >>> email.header.decode_header(h) [('\xe2\x98\xba', 'utf-8'), ('(unicode white smiling face)', None)] which is what you get if you insert a space before the '(' in h. ---------- nosy: +tlynn _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue1079> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com