[issue30701] Exception parsing certain invalid email address headers
Tim Bell added the comment: This appears to be the same issue as subsequently reported in #34155. -- ___ Python tracker <https://bugs.python.org/issue30701> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue30681] email.utils.parsedate_to_datetime() should return None when date cannot be parsed
Tim Bell added the comment: I've addressed the points in the last few comments and created a new PR (10783). -- ___ Python tracker <https://bugs.python.org/issue30681> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue30681] email.utils.parsedate_to_datetime() should return None when date cannot be parsed
Change by Tim Bell : -- keywords: +patch pull_requests: +10025 stage: -> patch review ___ Python tracker <https://bugs.python.org/issue30681> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue30988] Exception parsing invalid email address headers starting or ending with dot
Changes by Tim Bell <timothyb...@gmail.com>: -- pull_requests: +2862 ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue30988> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue30988] Exception parsing invalid email address headers starting or ending with dot
New submission from Tim Bell: Email addresses with a display name starting with a dot ("."), or ending with a dot without whitespace before the angle bracket trigger exceptions when accessing the header, after creating the message object with the "default" policy. For example: >>> import email >>> from email.policy import default >>> email.message_from_bytes(b'To: . Doe <j...@example.com>')['to'] '. Doe <j...@example.com>' >>> email.message_from_bytes(b'To: . Doe <j...@example.com>', >>> policy=default)['to'] Traceback (most recent call last): File "", line 1, in File "/Users/bhat/git/cpython/Lib/email/message.py", line 391, in __getitem__ return self.get(name) File "/Users/bhat/git/cpython/Lib/email/message.py", line 471, in get return self.policy.header_fetch_parse(k, v) File "/Users/bhat/git/cpython/Lib/email/policy.py", line 162, in header_fetch_parse return self.header_factory(name, value) File "/Users/bhat/git/cpython/Lib/email/headerregistry.py", line 586, in __call__ return self[name](name, value) File "/Users/bhat/git/cpython/Lib/email/headerregistry.py", line 197, in __new__ cls.parse(value, kwds) File "/Users/bhat/git/cpython/Lib/email/headerregistry.py", line 344, in parse for mb in addr.all_mailboxes])) File "/Users/bhat/git/cpython/Lib/email/headerregistry.py", line 344, in for mb in addr.all_mailboxes])) File "/Users/bhat/git/cpython/Lib/email/_header_value_parser.py", line 834, in display_name return self[0].display_name File "/Users/bhat/git/cpython/Lib/email/_header_value_parser.py", line 768, in display_name return self[0].display_name File "/Users/bhat/git/cpython/Lib/email/_header_value_parser.py", line 931, in display_name if res[0][0].token_type == 'cfws': AttributeError: 'str' object has no attribute 'token_type' >>> >>> email.message_from_bytes(b'To: John X.<j...@example.com>')['to'] 'John X.<j...@example.com>' >>> email.message_from_bytes(b'To: John X.<j...@example.com>', >>> policy=default)['to'] Traceback (most recent call last): File "", line 1, in File "/Users/bhat/git/cpython/Lib/email/message.py", line 391, in __getitem__ return self.get(name) File "/Users/bhat/git/cpython/Lib/email/message.py", line 471, in get return self.policy.header_fetch_parse(k, v) File "/Users/bhat/git/cpython/Lib/email/policy.py", line 162, in header_fetch_parse return self.header_factory(name, value) File "/Users/bhat/git/cpython/Lib/email/headerregistry.py", line 586, in __call__ return self[name](name, value) File "/Users/bhat/git/cpython/Lib/email/headerregistry.py", line 197, in __new__ cls.parse(value, kwds) File "/Users/bhat/git/cpython/Lib/email/headerregistry.py", line 344, in parse for mb in addr.all_mailboxes])) File "/Users/bhat/git/cpython/Lib/email/headerregistry.py", line 344, in for mb in addr.all_mailboxes])) File "/Users/bhat/git/cpython/Lib/email/_header_value_parser.py", line 834, in display_name return self[0].display_name File "/Users/bhat/git/cpython/Lib/email/_header_value_parser.py", line 768, in display_name return self[0].display_name File "/Users/bhat/git/cpython/Lib/email/_header_value_parser.py", line 936, in display_name if res[-1][-1].token_type == 'cfws': AttributeError: 'str' object has no attribute 'token_type' -- components: email messages: 298836 nosy: barry, r.david.murray, timb07 priority: normal severity: normal status: open title: Exception parsing invalid email address headers starting or ending with dot type: behavior versions: Python 3.5, Python 3.6, Python 3.7 ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue30988> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue30701] Exception parsing certain invalid email address headers
Tim Bell added the comment: I'm using the email package to ingest a firehose of spam; spammers aren't known for following norms or standards, so it's not surprising that I'm discovering lots of edge cases. I'll supply fixes for what I find where I can, time permitting. -- ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue30701> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue30701] Exception parsing certain invalid email address headers
New submission from Tim Bell: According to RFC 5322, an email address like this isn't valid: u...@example.com <u...@example.com> (The display-name "u...@example.com" contains "@", which isn't in the set of atext characters used to form an atom.) How it's handled by the email package varies by policy: >>> import email >>> from email.policy import default >>> email.message_from_bytes(b'To: u...@example.com <u...@example.com>')['to'] 'u...@example.com <u...@example.com>' >>> email.message_from_bytes(b'To: u...@example.com <u...@example.com>', >>> policy=default)['to'] 'u...@example.com' >>> email.message_from_bytes(b'To: u...@example.com <u...@example.com>', >>> policy=default).defects [] The difference between the behaviour under the compat32 vs "default" policy may or may not be significant. However, if coupled with a further invalid feature, namely a space after the ">", here's what happens: >>> email.message_from_bytes(b'To: u...@example.com <u...@example.com> ')['to'] 'u...@example.com <u...@example.com> ' >>> email.message_from_bytes(b'To: u...@example.com <u...@example.com> ', >>> policy=default)['to'] Traceback (most recent call last): File "", line 1, in File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/email/message.py", line 391, in __getitem__ return self.get(name) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/email/message.py", line 471, in get return self.policy.header_fetch_parse(k, v) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/email/policy.py", line 162, in header_fetch_parse return self.header_factory(name, value) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/email/headerregistry.py", line 586, in __call__ return self[name](name, value) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/email/headerregistry.py", line 197, in __new__ cls.parse(value, kwds) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/email/headerregistry.py", line 337, in parse kwds['parse_tree'] = address_list = cls.value_parser(value) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/email/headerregistry.py", line 328, in value_parser address_list, value = parser.get_address_list(value) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/email/_header_value_parser.py", line 2368, in get_address_list token, value = get_invalid_mailbox(value, ',') File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/email/_header_value_parser.py", line 2166, in get_invalid_mailbox token, value = get_phrase(value) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/email/_header_value_parser.py", line 1770, in get_phrase token, value = get_word(value) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/email/_header_value_parser.py", line 1745, in get_word if value[0]=='"': IndexError: string index out of range >>> email.message_from_bytes(b'To: u...@example.com <u...@example.com> ', >>> policy=default).defects [] I believe that the preferred behaviour would be to add a defect to the message object during parsing instead of throwing an exception when the invalid header value is accessed. -- components: email messages: 296309 nosy: barry, r.david.murray, timb07 priority: normal severity: normal status: open title: Exception parsing certain invalid email address headers type: behavior versions: Python 3.5, Python 3.6, Python 3.7 ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue30701> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue30681] email.utils.parsedate_to_datetime() should return None when date cannot be parsed
Tim Bell added the comment: I've updated the pull request to incorporate Barry's suggestion of a new defect for this situation, InvalidDateDefect. -- ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue30681> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue30681] email.utils.parsedate_to_datetime() should return None when date cannot be parsed
Tim Bell added the comment: Thanks for the feedback. I've made a new pull request which addresses the points raised. -- ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue30681> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue30681] email.utils.parsedate_to_datetime() should return None when date cannot be parsed
Changes by Tim Bell <timothyb...@gmail.com>: -- pull_requests: +2304 ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue30681> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue30681] email.utils.parsedate_to_datetime() should return None when date cannot be parsed
Tim Bell added the comment: My proposed solution (in https://github.com/python/cpython/pull/2229) is two-part: 1. change parsedate_to_datetime() to return None rather than raising an exception; and 2. change headerregistry.DateHeader.parse() to check for None being returned from parsedate_to_datetime(), and to add a defect; the datetime attribute is set to None (as if the Date header were missing), but the header still evaluates as a string to the supplied header value. I'm not sure what the use case is for distinguishing between a missing Date header and an invalid date value, but can't that be distinguished by the different defects added to the header? In any case, if I'm not fully grasping the context and parsedate_to_datetime() should continue to throw exceptions, then a slightly different modification to DateHeader to catch those exceptions would seem sensible, and would address my use case. -- ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue30681> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue30681] email.utils.parsedate_to_datetime() should return None when date cannot be parsed
Changes by Tim Bell <timothyb...@gmail.com>: -- pull_requests: +2274 ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue30681> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue30681] email.utils.parsedate_to_datetime() should return None when date cannot be parsed
New submission from Tim Bell: Python 3.6 documentation for email.utils.parsedate_to_datetime() says "Performs the same function as parsedate(), but on success returns a datetime." The docs for parsedate() say "If it succeeds in parsing the date...; otherwise None will be returned." By implication, parsedate_to_datetime() should return None when the date can't be parsed. There are two different failure modes for parsedate_to_datetime(): 1. When _parsedate_tz() fails to parse the date and returns None: >>> from email.utils import parsedate_to_datetime >>> parsedate_to_datetime('0') Traceback (most recent call last): File "", line 1, in File "/usr/lib/python3.6/email/utils.py", line 210, in parsedate_to_datetime *dtuple, tz = _parsedate_tz(data) TypeError: 'NoneType' object is not iterable 2. When _parsedate_tz() succeeds, but conversion to datetime.datetime fails: >>> parsedate_to_datetime('Tue, 06 Jun 2017 27:39:33 +0600') Traceback (most recent call last): File "", line 1, in File "/usr/lib/python3.6/email/utils.py", line 214, in parsedate_to_datetime tzinfo=datetime.timezone(datetime.timedelta(seconds=tz))) ValueError: hour must be in 0..23 Note that this second case is the one that led me to this issue. I am using the email package to parse spam emails for subsequent analysis, and a certain group of spam emails contain invalid hour fields in their Date header. I don't require the invalid Date header to be converted to a datetime.datetime, but accessing email_message['date'] to access the header value as a string triggers the ValueError exception. I can work around this with a custom email policy, but the observed behaviour does seem to contradict the documented behaviour. Also, in relation to https://bugs.python.org/issue15925, r.david.murray commented "Oh, and I'm purposely allowing parsedate_to_datetime throw exceptions. I suppose that should be documented, but that's a separate issue." However, no argument for why parsedate_to_datetime throwing exceptions is desired was given. -- components: email messages: 296137 nosy: barry, r.david.murray, timb07 priority: normal severity: normal status: open title: email.utils.parsedate_to_datetime() should return None when date cannot be parsed type: behavior versions: Python 3.5, Python 3.6 ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue30681> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com