Jeffrey Kintscher <websur...@surf2c.net> added the comment:

I uploaded a test script with some test cases:

The failure mode occurs when

1. line folding occurs
2. the first folded line has two or more words with UTF-8 characters
3. subsequent lines contain a word with UTF-8 characters located at a different 
offset than the last encoded substring in the first line

For example, the first folded and encoded line of 'Hello Wörld! Hello Wörld! 
Hello Wörld! Hello Wörld!Hello Wörld!' is

b'Subject: Hello =?utf-8?q?W=C3=B6rld!_Hello_W=C3=B6rld!_Hello_W=C3=B6rld!?='

and the second line should be

b' Hello =?utf-8?q?W=C3=B6rld!Hello_W=C3=B6rld!?='

but instead, it is

b' Hello =?utf-8?=?utf-8?q?q=3FW=3DC3=3DB6rld!Hello=3F=3D_W=C3=B6rld!?='

The function at fault is _refold_parse_tree() in 
Lib/email/_header_value_parser.py. In the first line, it encodes the first 
UTF-8 word and saves the starting offset in the output string (15). When it 
encounters the second UTF-8 word, it re-encodes the entire string starting at 
the saved offset. This is to help reduce the bloat added by multiple 
'=?utf-8?q?' start-of-encoding tokens. When it encodes the first UTF-8 word on 
the second line, it tries to store it at the saved offset into the second line 
output string, but that is past the end of the string so it just gets appended. 
When it encounter the second UTF-8 word in the second line, it re-encodes the 
entire second-line string starting at the saved offset (15), which is in the 
middle of the first encoded UTF-8 string.

The failure mode is not triggered if there is at most one UTF-8 word in each 
folded line. It also is not triggered when folding occurs in the middle of a 
word instead of at whitespace because the code follows a different path.

The solution is to set the saved starting offset to None when starting a new 
folded line when the fold-point is whitespace.

I will submit a pull request soon with a fix.

----------
Added file: https://bugs.python.org/file48366/bpo-36520-test.py

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue36520>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to