R. David Murray <rdmur...@bitdance.com> added the comment:

>>> m = message_from_string("From: John Doe j...@example.com 
>>> <ot...@example.net>\n\n", policy=default)
    >>> m['From'].addresses(Address(display_name='', username='John Doe jdoe', 
domain='example.com'),)

The new policies have more error recovery for non-RFC compliant addresses than 
decode_header, but the two agree in this case.  What is happening here is that 
(1) an unquoted/unencoded '@' is not allowed in a display name (2) if the 
address is not '<>' quoted, then everything before the @ is the username and 
(3) in the absence of a comma after the end of the fqdn (which is not allowed 
to contain blanks) any additional tokens are discarded.

One could argue that we could treat the blank after the FQDN as a "missing 
comma", and there would be some merit to that argument.  You could also argue 
that a "<>" quoted string would trump the occurrence of the @ earlier in the 
token list.  However, the RFC822 grammar is designed to be parsed character by 
character, so that would not be a typical way for an RFC822 parser to try to do 
postel-style error recovery.

So, I don't think there is a bug here, but I'd be curious what other email 
address parsing libraries do, and that could influence whether extensions to 
the "make a guess when the string doesn't conform to the RFC" code would be 
acceptable.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue34155>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to