R. David Murray <[email protected]> added the comment:
>>> m = message_from_string("From: John Doe [email protected]
>>> <[email protected]>\n\n", policy=default)
>>> m['From'].addresses(Address(display_name='', username='John Doe jdoe',
domain='example.com'),)
The new policies have more error recovery for non-RFC compliant addresses than
decode_header, but the two agree in this case. What is happening here is that
(1) an unquoted/unencoded '@' is not allowed in a display name (2) if the
address is not '<>' quoted, then everything before the @ is the username and
(3) in the absence of a comma after the end of the fqdn (which is not allowed
to contain blanks) any additional tokens are discarded.
One could argue that we could treat the blank after the FQDN as a "missing
comma", and there would be some merit to that argument. You could also argue
that a "<>" quoted string would trump the occurrence of the @ earlier in the
token list. However, the RFC822 grammar is designed to be parsed character by
character, so that would not be a typical way for an RFC822 parser to try to do
postel-style error recovery.
So, I don't think there is a bug here, but I'd be curious what other email
address parsing libraries do, and that could influence whether extensions to
the "make a guess when the string doesn't conform to the RFC" code would be
acceptable.
----------
_______________________________________
Python tracker <[email protected]>
<https://bugs.python.org/issue34155>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com