Abhilash Raj <raj.abhila...@gmail.com> added the comment:

I tried to take a look at the code to see where the fix needs to be and I 
probably need some help.

I looked at the parse tree for the header and it looks something like this:

ContentDisposition([Token([ValueTerminal('attachment')]), ValueTerminal(';'), 
MimeParameters([Parameter([Attribute([CFWSList([WhiteSpaceTerminal(' ')]), 
ValueTerminal('filename')]), ValueTerminal('='), 
Value([QuotedString([BareQuotedString([EncodedWord([ValueTerminal('Schulbesuchsbestättigung.')]),
 WhiteSpaceTerminal('    '), EncodedWord([ValueTerminal('pdf')])])])])])])])


The offending piece of code, which seems to be working as designed is 
get_bare_quoted_string() in email/_header_value_parser.py. 

    while value and value[0] != '"':
        if value[0] in WSP:
            token, value = get_fws(value)
        elif value[:2] == '=?':
            try:
                token, value = get_encoded_word(value)
                bare_quoted_string.defects.append(errors.InvalidHeaderDefect(
                    "encoded word inside quoted string"))
            except errors.HeaderParseError:
                token, value = get_qcontent(value)
        else:
            token, value = get_qcontent(value)
        bare_quoted_string.append(token)

It just loops and parses the values. We cannot ignore the FWS until we know 
that the atom before and after the FWS are encoded words. I can't seem to find 
a clean way to look-ahead (which can perhaps be used in get_parameters()) or 
look-back (which can be used after parsing the entire bare_quoted_string?) in 
the parse tree to delete the offending whitespace. 

Any example of such kind of parse-tree manipulation in the code base would be 
awesome!

----------
versions: +Python 3.9 -Python 3.5, Python 3.6

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue39040>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to