Abhilash Raj <[email protected]> added the comment:
I tried to take a look at the code to see where the fix needs to be and I
probably need some help.
I looked at the parse tree for the header and it looks something like this:
ContentDisposition([Token([ValueTerminal('attachment')]), ValueTerminal(';'),
MimeParameters([Parameter([Attribute([CFWSList([WhiteSpaceTerminal(' ')]),
ValueTerminal('filename')]), ValueTerminal('='),
Value([QuotedString([BareQuotedString([EncodedWord([ValueTerminal('Schulbesuchsbestättigung.')]),
WhiteSpaceTerminal(' '), EncodedWord([ValueTerminal('pdf')])])])])])])])
The offending piece of code, which seems to be working as designed is
get_bare_quoted_string() in email/_header_value_parser.py.
while value and value[0] != '"':
if value[0] in WSP:
token, value = get_fws(value)
elif value[:2] == '=?':
try:
token, value = get_encoded_word(value)
bare_quoted_string.defects.append(errors.InvalidHeaderDefect(
"encoded word inside quoted string"))
except errors.HeaderParseError:
token, value = get_qcontent(value)
else:
token, value = get_qcontent(value)
bare_quoted_string.append(token)
It just loops and parses the values. We cannot ignore the FWS until we know
that the atom before and after the FWS are encoded words. I can't seem to find
a clean way to look-ahead (which can perhaps be used in get_parameters()) or
look-back (which can be used after parsing the entire bare_quoted_string?) in
the parse tree to delete the offending whitespace.
Any example of such kind of parse-tree manipulation in the code base would be
awesome!
----------
versions: +Python 3.9 -Python 3.5, Python 3.6
_______________________________________
Python tracker <[email protected]>
<https://bugs.python.org/issue39040>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com